llvm-project

Commit Graph

Author	SHA1	Message	Date
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
Kai Luo	3f3bae33a2	[NFC] Test if commit access granted. llvm-svn: 362917	2019-06-10 03:20:33 +00:00
Jinsong Ji	7aafdef627	[MachineScheduler] checkResourceLimit boundary condition update When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805	2019-06-07 14:54:47 +00:00
Sam Parker	c5ef502ee8	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774	2019-06-07 07:35:30 +00:00
Nemanja Ivanovic	ef4a3aa549	[PowerPC] Exploit the vector min/max instructions Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759	2019-06-06 23:49:01 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Jason Liu	0338b88861	[AIX] Implement call lowering with parameters could pass onto GPRs Summary: This patch implements SDAG call lowering on AIX for functions which only have parameters that could fit into GPRs. Reviewers: hubert.reinterpretcast, syzaara Differential Revision: https://reviews.llvm.org/D62823 llvm-svn: 362708	2019-06-06 14:36:43 +00:00
Nemanja Ivanovic	7c842fadf1	[PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible Generally speaking, we lower to an optimal rotate sequence for nodes visible in the SDAG. However, there are instances where the two rotates are not visible at ISEL time - most notably those in a very common sequence when lowering switch statements to jump tables. A common situation is a switch on a 32-bit integer. This value has to have the upper 32 bits cleared and because jump table offsets are word offsets, the value needs to be shifted left by 2 bits. We currently emit the clear and the left shift as two separate instructions, but this is not needed as we can lower it to a single RLDIC. This patch just cleans that up. Differential revision: https://reviews.llvm.org/D60402 llvm-svn: 362576	2019-06-05 02:36:40 +00:00
Nemanja Ivanovic	cfb6c82172	[PowerPC][NFC] Add codegen test for consecutive stores of vector elements NFC commit of a test case in order for the subsequent review to show differences in codegen. Differential revision: https://reviews.llvm.org/D62843 llvm-svn: 362573	2019-06-05 02:09:03 +00:00
Nemanja Ivanovic	aed7227b71	Revert r362472 as it is breaking PPC build bots The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539	2019-06-04 18:48:43 +00:00
Roman Lebedev	925553ec91	[NFC][Codegen][PowerPC] Autogenerate shift-cmp.ll test Being affected by upcoming patch llvm-svn: 362529	2019-06-04 17:05:34 +00:00
Jinsong Ji	3144d7a2da	[PowerPC] P9 Scheduling Model: dispatching rule fixes This is to address some of the problems in existing P9 resource modeling, especially about the dispatching rules. Instead of using a hypothetical DISPATCHER , we try to use the number of actual dispatch slots, and define SchedWriteRes to model dispatch rules, then update instruction classes according to dispatch rules. All the dispatch rules and instruction classes update are made according to POWER9 User Manual. Differential Revision: https://reviews.llvm.org/D61873 llvm-svn: 362509	2019-06-04 15:22:23 +00:00
QingShan Zhang	11de0e71b0	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472	2019-06-04 08:53:53 +00:00
Chen Zheng	a050b25544	[PowerPC] add testcases for reordering LSR and PPCCTRLoops - NFC llvm-svn: 362468	2019-06-04 06:48:14 +00:00
Michael Berg	0b7f98da65	Propagate fmf for setcc/select folds Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439	2019-06-03 19:12:15 +00:00
Nemanja Ivanovic	bad43d8f49	[PowerPC] Look through copies for compare elimination We currently miss the opportunities for optmizing comparisons in the peephole optimizer if the input is the result of a COPY since we look for record-form versions of the producing instruction. This patch simply lets the optimization peek through copies. Differential revision: https://reviews.llvm.org/D59633 llvm-svn: 362438	2019-06-03 19:09:15 +00:00
Guozhi Wei	c3a24e93d5	[PPC] Correctly adjust branch probability in PPCReduceCRLogicals In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%. condc = conda \|\| condb br condc, label %target, label %fallthrough It can be transformed to following, br conda, label %target, label %newbb newbb: br condb, label %target, label %fallthrough Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%. This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged. Differential Revision: https://reviews.llvm.org/D62430 llvm-svn: 362237	2019-05-31 16:11:17 +00:00
Kevin P. Neal	308b7139b1	Partial revert of revert of r361827: Add constrained intrinsic tests for powerpc64le. The powerpc64-"nonle" tests are removed. They fail because of a bug that Drew is currently working on that affects multiple targets. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Hal Finkel, Kevin P. Neal Approved by: Hal Finkel Differential Revision: http://reviews.llvm.org/D62388 llvm-svn: 361985	2019-05-29 16:29:31 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Kevin P. Neal	71f8f745b4	Revert 361827. It broke the bots. llvm-svn: 361831	2019-05-28 14:37:45 +00:00
Kevin P. Neal	6d458fa866	Add constrained intrinsic tests for powerpc64 and powerpc64le. Submitted by: Drew Wock Reviewed by: Hal Finkel Approved by: Hal Finkel Differential Revision: https://reviews.llvm.org/D62388 llvm-svn: 361827	2019-05-28 14:17:48 +00:00
Sanjay Patel	91131b6500	[SelectionDAG] soften assertion when legalizing narrow vector FP ops The test based on PR42010: https://bugs.llvm.org/show_bug.cgi?id=42010 ...may show an inaccuracy for PPC's target defs, but we should not be so aggressive with an assert here. There's no telling what out-of-tree targets look like. llvm-svn: 361696	2019-05-25 13:48:07 +00:00
Jason Liu	8e1d921bb3	Implement call lowering without parameters on AIX Summary:dd This patch implements call lowering for calls without parameters on AIX as initial support. Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma Differential Revision: https://reviews.llvm.org/D61948 llvm-svn: 361669	2019-05-24 20:54:35 +00:00
Stefan Pintilie	522307fa40	[PowerPC] Remove CRBits Copy Of Unset/set CBit For the situation, where we generate the following code: crxor 8, 8, 8 < Some instructions> .LBB0_1: < Some instructions> cror 1, 8, 8 cror (COPY of CRbit) depends on the result of the crxor instruction. CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on any previous instruction. This patch will optimize it to: < Some instructions> .LBB0_1: < Some instructions> cror 1, 1, 1 Patch By: Victor Huang (NeHuang) Differential Revision: https://reviews.llvm.org/D62044 llvm-svn: 361632	2019-05-24 12:05:37 +00:00
QingShan Zhang	449bfdd1b0	[Power9] Add a specific heuristic to schedule the addi before the load When we are scheduling the load and addi, if all other heuristic didn't take effect, we will try to schedule the addi before the load, to hide the latency, and avoid the true dependency added by RA. And this only take effects for Power9. Differential Revision: https://reviews.llvm.org/D61930 llvm-svn: 361600	2019-05-24 05:30:09 +00:00
Roman Lebedev	f81ebfb045	UpdateTestChecks: ppc32 triple support Summary: Appears identical to powerpc64{,le}. Regenerate test that is being affected by upcoming patch. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: nemanjai, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62339 llvm-svn: 361543	2019-05-23 19:54:41 +00:00
Roman Lebedev	702a152e6a	[NFC][PPC] Autogenerate vec_add_sub_quadword.ll test Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch. llvm-svn: 361525	2019-05-23 18:08:26 +00:00
Roman Lebedev	c8364ef567	[NFC][PPC] Autogenerate vec_add_sub_doubleword.ll test Being affected by (sub %x, C) -> add %X, (sub 0, C) 'for vectors' patch. llvm-svn: 361524	2019-05-23 18:08:21 +00:00
Kees Cook	c2187c20a4	[TargetLowering] Extend bool args to inline-asm according to getBooleanType Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404	2019-05-22 16:16:15 +00:00
Fangrui Song	1c61471ab1	[PPC64] Parse -elfv1 -elfv2 when specified on target triple Summary: For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2]. The following results are expected: ELFv1 when using: -target powerpc64-unknown-freebsd12.0 -target powerpc64-unknown-freebsd12.0 -mabi=elfv1 -target powerpc64-unknown-freebsd12.0-elfv1 ELFv2 when using: -target powerpc64-unknown-freebsd12.0 -mabi=elfv2 -target powerpc64-unknown-freebsd12.0-elfv2 [1] https://wiki.freebsd.org/powerpc/llvm-elfv2 [2] https://clang.llvm.org/docs/CrossCompilation.html Patch by Alfredo Dal'Ava Júnior! Differential Revision: https://reviews.llvm.org/D61950 llvm-svn: 361355	2019-05-22 07:29:59 +00:00
Chen Zheng	9970665f60	[PowerPC] [ISEL] select x-form instruction for unaligned offset Differential Revision: https://reviews.llvm.org/D62173 llvm-svn: 361346	2019-05-22 02:57:31 +00:00
Yi-Hong Lyu	00e85f7535	Move csr-save-restore-order.ll to the right place llvm-svn: 361306	2019-05-21 20:28:31 +00:00
QingShan Zhang	690fa1b51b	[NFC][PowerPC] Add a test to verify if the scheduler schedule the addi before the load. llvm-svn: 361221	2019-05-21 06:32:31 +00:00
Chen Zheng	e64bcada5f	[PowerPC] test cases for selecting x-form instruction for unaligned offset - NFC llvm-svn: 361219	2019-05-21 05:06:09 +00:00
Clement Courbet	632dfdda16	Re-land r360859: "[MergeICmps] Simplify the code." With a fix for PR41917: The predecessor list was changing under our feet. - for (BasicBlock Pred : predecessors(EntryBlock_)) { + while (!pred_empty(EntryBlock_)) { + BasicBlock const Pred = *pred_begin(EntryBlock_); llvm-svn: 361009	2019-05-17 09:43:45 +00:00
Nico Weber	d764e7c660	Revert r360859: "Reland r360771 "[MergeICmps] Simplify the code."" It caused PR41917. llvm-svn: 360963	2019-05-17 00:43:53 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Matt Arsenault	828b685ebe	RegAllocFast: Improve hinting heuristic Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887	2019-05-16 12:50:39 +00:00
Clement Courbet	c4fdd717ef	Reland r360771 "[MergeICmps] Simplify the code." This revision does not seem to be the culprit. llvm-svn: 360859	2019-05-16 06:18:02 +00:00
Nicolai Haehnle	664ceeda68	RegAlloc: try to fail more gracefully when out of registers Summary: The emitError path allows the program to continue, unlike report_fatal_error. This is friendlier to use cases where LLVM is embedded in a larger program, because the caller may be able to deal with the error somewhat gracefully. Change the number of requested NOP bytes in the AArch64 and PowerPC test cases to avoid triggering an unrelated assertion. The compilation still fails, as verified by the test. Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d Reviewers: arsenm, MatzeB Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61489 llvm-svn: 360786	2019-05-15 17:29:58 +00:00
Clement Courbet	eaf4413d2d	Revert r360771 "[MergeICmps] Simplify the code." Breaks a bunch of builbdots. llvm-svn: 360776	2019-05-15 14:21:59 +00:00
Clement Courbet	157ae639fa	[MergeICmps] Simplify the code. Instead of patching the original blocks, we now generate new blocks and delete the old blocks. This results in simpler code with a less twisted control flow (see the change in `entry-block-shuffled.ll`). This will make https://reviews.llvm.org/D60318 simpler by making it more obvious where control flow created and deleted. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61736 llvm-svn: 360771	2019-05-15 13:04:24 +00:00
Fangrui Song	f4dfd63c74	[IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format The 3-field form was introduced by D3499 in 2014 and the legacy 2-field form was planned to be removed in LLVM 4.0 For the textual format, this patch migrates the existing 2-field form to use the 3-field form and deletes the compatibility code. test/Verifier/global-ctors-2.ll checks we have a friendly error message. For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the 2-field form (add i8* null as the third field). Reviewed By: rnk, dexonsmith Differential Revision: https://reviews.llvm.org/D61547 llvm-svn: 360742	2019-05-15 02:35:32 +00:00
Lei Huang	22561972af	[PowerPC] Custom lower known CR bit spills For known CRBit spills, CRSET/CRUNSET, it is more efficient to load and spill the known value instead of extracting the bit. eg. This sequence is currently used to spill a CRUNSET: crclr 4*cr5+lt mfocrf r3,4 rlwinm r3,r3,20,0,0 stw r3,132(r1) This patch custom lower it to: li r3,0 stw r3,132(r1) Differential Revision: https://reviews.llvm.org/D61754 llvm-svn: 360677	2019-05-14 14:27:06 +00:00
Jinsong Ji	b7b3d866a4	[PowerPC][NFC] Fix typos in triples Found by bzEq (Kai Luo). llvm-svn: 360643	2019-05-14 03:11:24 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Lei Huang	1ac6e9636c	[PowerPC] custom lower `v2f64 fpext v2f32` Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32. eg. For the following IR %0 = load <2 x float>, <2 x float>* %Ptr, align 8 %1 = fpext <2 x float> %0 to <2 x double> ret <2 x double> %1 Pre custom lowering: ld r3, 0(r3) mtvsrd f0, r3 xxswapd vs34, vs0 xscvspdpn f0, vs0 xxsldwi vs1, vs34, vs34, 3 xscvspdpn f1, vs1 xxmrghd vs34, vs0, vs1 After custom lowering: lfd f0, 0(r3) xxmrghw vs0, vs0, vs0 xvcvspdp vs34, vs0 Differential Revision: https://reviews.llvm.org/D57857 llvm-svn: 360429	2019-05-10 14:04:06 +00:00
Nemanja Ivanovic	80808ed0f6	[PowerPC][NFC] Add test for D60506 to show differences in code-gen Differential revision: https://reviews.llvm.org/D61723 llvm-svn: 360338	2019-05-09 12:26:39 +00:00
QingShan Zhang	5f7c86147d	[NFC][PowerPC] Add test for store combine optimization. llvm-svn: 360229	2019-05-08 07:56:59 +00:00
QingShan Zhang	0e71a6e755	[CodeGenPrepare] Don't split the store if it is volatile We shouldn't split the store when it is volatile. Differential Revision: https://reviews.llvm.org/D61169 llvm-svn: 360228	2019-05-08 07:32:12 +00:00
Jinsong Ji	cc63db4ff0	[PowerPC][NFC] Update build-vector-tests.ll using utils/update_llc_test_checks.py build-vector-tests.ll is a huge testcase, it is hard to maintain: eg: any fundamental changes might need to update hundreds of lines. We should leverage the script to maintain it. This patch simply run utils/update_llc_test_checks.py on it. There should be no missing test points. llvm-svn: 360175	2019-05-07 17:29:44 +00:00
Nemanja Ivanovic	b4f028f0f3	[PowerPC] Use the two-constant NR algorithm for refining estimates The single-constant algorithm produces infinities on a lot of denormal values. The precision of the two-constant algorithm is actually sufficient across the range of denormals. We will switch to that algorithm for now to avoid the infinities on denormals. In the future, we will re-evaluate the algorithm to find the optimal one for PowerPC. Differential revision: https://reviews.llvm.org/D60037 llvm-svn: 360144	2019-05-07 13:48:03 +00:00
Nemanja Ivanovic	70afe4f7e1	[PowerPC] Fix erroneous condition for converting uint-to-fp vector conversion A condition for exiting the legalization of v4i32 conversion to v2f64 through extract/convert/build erroneously checks for the extract having type i32. This is not adequate as smaller extracts are actually legalized to i32 as well. Furthermore, an early exit is missing which means that we only check that both extracts are from the same vector if that check fails. As a result, both cases in the included test case fail - the first gets a select error and the second generates incorrect code. The culprit commit is r274535. llvm-svn: 360043	2019-05-06 13:35:49 +00:00
Matt Arsenault	b6c599afd3	Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block" This reverts commit r359912. This should pass now, since the clang test was made less fragile in r359918. llvm-svn: 359919	2019-05-03 19:06:57 +00:00
Nico Weber	bb852a9672	Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block" Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail. llvm-svn: 359912	2019-05-03 18:08:03 +00:00
Matt Arsenault	daf2d653fa	RegAllocFast: Add heuristic to detect values not live-out of a block Add an improved/new heuristic to catch more cases when values are not live out of a basic block. Patch by Matthias Braun llvm-svn: 359906	2019-05-03 17:03:24 +00:00
Sanjay Patel	9f68614494	[PowerPC] add test that could infinite loop with reordered transforms; NFC This is a slightly reduced version of the test from D61384. Adding this as a preliminary step, so I can update D61149 with the proposed fix. llvm-svn: 359709	2019-05-01 17:34:30 +00:00
Fangrui Song	6afcdcf9ab	[llvm-readobj] Change -t to --symbols in tests. NFC -t is --symbols in llvm-readobj but --section-details (unimplemented) in readelf. The confusing option should not be used since we aim for improving compatibility. Keep just one llvm-readobj -t use case in test/tools/llvm-readobj/symbols.test llvm-svn: 359661	2019-05-01 09:28:24 +00:00
Kang Zhang	d43b66b318	[NFC][PowerPC] Use -check-prefixes to simplify the check in code-align.ll Summary: When checking the same output, we can use the `-check-prefixes` to simplify the check. For example, if we want to check below output. ``` ; GENERIC-LABEL: .globl foo ; BASIC-LABEL: .globl foo ; PWR-LABEL: .globl foo ; GENERIC: .p2align 2 ; BASIC: .p2align 4 ; PWR: .p2align 4 ; GENERIC: @foo ; BASIC: @foo ; PWR: @foo ``` If we use `-check-prefixes` ``` ... -check-prefixes=CHECK,GENERAL ... -check-prefixes=CHECK,BASIC ... -check-prefixes=CHECK,PWR ``` Above check can be simplify to: ``` ; CHECK-LABEL: .globl foo ; GENERIC: .p2align 2 ; BASIC: .p2align 4 ; PWR: .p2align 4 ; CHECK: @foo ``` Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D61227 llvm-svn: 359533	2019-04-30 03:39:05 +00:00
Zi Xuan Wu	49d60fdc2e	[DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), if adde is not legal for the target. Even it's at type-legalize phase. Because adde is special and will not be legalized at operation-legalize phase later. This fixes: PR40922 https://bugs.llvm.org/show_bug.cgi?id=40922 Differential Revision: https://reviews.llvm.org//D60854 llvm-svn: 359532	2019-04-30 03:01:14 +00:00
Ahsan Saghir	3962d6da17	Add __builtin_dcbf support for PPC Summary: This patch adds support for __builtin_dcbf for PPC. __builtin_dcbf copies the contents of a modified block from the data cache to main memory and flushes the copy from the data cache. Differential revision: https://reviews.llvm.org/D59843 llvm-svn: 359517	2019-04-29 23:25:33 +00:00
Roland Froese	728e139700	[PowerPC] Try harder to avoid load/move-to VSR for partial vector loads Change the PPCISelLowering.cpp function that decides to avoid update form in favor of partial vector loads to know about newer load types and to not be confused by the chain operand. Differential Revision: https://reviews.llvm.org/D60102 llvm-svn: 359504	2019-04-29 21:08:35 +00:00
Nick Desaulniers	7ab164c4a4	[AsmPrinter] refactor to support %c w/ GlobalAddress' Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337	2019-04-26 18:45:04 +00:00
Joerg Sonnenberger	8372b467f1	[PowerPC] Allow using initial-exec TLS with PIC Using initial-exec TLS variables is a reasonable performance optimisation for system libraries. Use the correct PIC mechanism to get hold of the GOT to avoid text relocations. Differential Revision: https://reviews.llvm.org/D61026 llvm-svn: 359146	2019-04-24 22:12:22 +00:00
Kang Zhang	009a21d2fd	[PowerPC] Fix wrong ElemSIze when calling isConsecutiveLS() Summary: This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177 When the two operands for BUILD_VECTOR are same, we will get assert error. llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&): Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) && "The loads cannot be both consecutive and reverse consecutive."' failed. This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We should use `getScalarType().getStoreSize();` to get the ElemSize instread of `getScalarSizeInBits() / 8`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60811 llvm-svn: 358644	2019-04-18 07:24:15 +00:00
Nick Desaulniers	a2077bab40	[AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC Summary: None of these derived classes do anything that the base class cannot. If we remove these case statements, then the base class can handle them just fine. Reviewers: peter.smith, echristo Reviewed By: echristo Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60803 llvm-svn: 358603	2019-04-17 18:22:48 +00:00
Kang Zhang	2446f843ae	[PowerPC] Add initialization for some ppc passes Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358271	2019-04-12 09:59:40 +00:00
Eric Christopher	b6926bdcff	Revert "[PowerPC] Add initialization for some ppc passes" This reverts commit `6f8f98ce8d` as it is breaking nearly every bot. llvm-svn: 358260	2019-04-12 07:16:58 +00:00
Kang Zhang	6f8f98ce8d	[PowerPC] Add initialization for some ppc passes Summary: Some llc debug options need pass-name as the parameters. But if we use the pass-name ppc-early-ret, we will get below error: llc test.ll -stop-after ppc-early-ret LLVM ERROR: "ppc-early-ret" pass is not registered. Below pass-names have the pass is not registered error: ppc-ctr-loops ppc-ctr-loops-verify ppc-loop-preinc-prep ppc-toc-reg-deps ppc-vsx-copy ppc-early-ret ppc-vsx-fma-mutate ppc-vsx-swaps ppc-reduce-cr-ops ppc-qpx-load-splat ppc-branch-coalescing ppc-branch-select Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D60248 llvm-svn: 358256	2019-04-12 06:35:15 +00:00
Zi Xuan Wu	ac79ef8f0e	[PowerPC] More precise exploitation of P9 maddld instruction when operands are constant There are 3 operands of maddld, (add (mul %1, %2), %3) and sometimes they are constant. If there is constant operand, it takes extra li to materialize the operand, and one more extra register too. So it's not profitable to use maddld to optimize mul-add pattern. Differential Revision: https://reviews.llvm.org/D60181 llvm-svn: 358253	2019-04-12 05:21:31 +00:00
David Green	0861c87b06	Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113	2019-04-10 18:00:41 +00:00
Chen Zheng	19ce6719bc	[PowerPC] initialize SchedModel according to platform. Differential Revision: https://reviews.llvm.org/D60177 llvm-svn: 357962	2019-04-09 01:25:25 +00:00
Piotr Sobczak	0376ac1d94	[SelectionDAG] Compute known bits of CopyFromReg Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745	2019-04-05 07:44:09 +00:00
Chen Zheng	4178c15330	[PowerPC]add testcase for ppcctrloops pass shortloop check llvm-svn: 357560	2019-04-03 03:11:34 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Kang Zhang	05f78b35ae	[PowerPC] Add the support for __builtin_setrnd() Summary: PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode. double __builtin_setrnd(int mode); The effective values for mode are: 0 - round to nearest 1 - round to zero 2 - round to +infinity 3 - round to -infinity Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2). Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D59405 llvm-svn: 357241	2019-03-29 08:45:24 +00:00
Zi Xuan Wu	1445b77e8c	[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns. ``` (mul x, 2^N + 1) => (add (shl x, N), x) (mul x, -(2^N + 1)) => -(add (shl x, N), x) (mul x, 2^N - 1) => (sub (shl x, N), x) (mul x, -(2^N - 1)) => (sub x, (shl x, N)) ``` And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply. Differential Revision: https://reviews.llvm.org/D58950 llvm-svn: 357233	2019-03-29 03:08:39 +00:00
Nirav Dave	c6dfaa0e83	Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116	2019-03-27 19:54:41 +00:00
Stefan Pintilie	e1d79a87c6	[PowerPC] Remove UseVSXReg The UseVSXReg flag can be safely removed and the code cleaned up. Patch By: Yi-Hong Liu Differential Revision: https://reviews.llvm.org/D58685 llvm-svn: 357028	2019-03-26 20:28:21 +00:00
Nirav Dave	a28c514581	[DAG] Avoid smart constructor-based dangling nodes. Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996	2019-03-26 15:08:14 +00:00
Matt Arsenault	c2e35a6f32	RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs The 2nd loop calculates spill costs but reports free registers as cost 0 anyway, so there is little benefit from having a separate early loop. Surprisingly this is not NFC, as many register are marked regDisabled so the first loop often picks up later registers unnecessarily instead of the first one available in the allocation order... Patch by Matthias Braun llvm-svn: 356499	2019-03-19 19:01:34 +00:00
Nirav Dave	d6351340bb	[DAGCombiner] If a TokenFactor would be merged into its user, consider the user later. Summary: A number of optimizations are inhibited by single-use TokenFactors not being merged into the TokenFactor using it. This makes we consider if we can do the merge immediately. Most tests changes here are due to the change in visitation causing minor reorderings and associated reassociation of paired memory operations. CodeGen tests with non-reordering changes: X86/aligned-variadic.ll -- memory-based add folded into stored leaq value. X86/constant-combiners.ll -- Optimizes out overlap between stores. X86/pr40631_deadstore_elision -- folds constant byte store into preceding quad word constant store. Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet Reviewed By: courbet Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59260 llvm-svn: 356068	2019-03-13 17:07:09 +00:00
Jinsong Ji	9dc2c1d564	Set useful flags for vector imm setting instructions Vector imm setting instructions like XXLXORz/XXLXORspz/XXLXORdpz Should behave like LI8. We should set corresponding flags to allow rematerialization and other opts in LICM, RA, Scheduling etc. Differential Revision: https://reviews.llvm.org/D58645 llvm-svn: 355948	2019-03-12 18:27:09 +00:00
Jinsong Ji	b6bfcfc847	[NFC][PowerPC] Update testcases using utils/update_llc_test_checks.py llvm-svn: 355945	2019-03-12 17:55:32 +00:00
Simon Pilgrim	417f8c5be4	[PowerPC] Use real pointers instead of undef The reduced test removed the pointer arguments, but to better survive D58017 and D58070 we need them back. llvm-svn: 355532	2019-03-06 18:49:39 +00:00
Guozhi Wei	11308bdb43	[PPC] Adjust the computed branch offset for the possible shorter distance In file PPCBranchSelector.cpp we tend to over estimate code size due to large alignment and inline assembly. Usually it causes larger computed branch offset, it is not big problem. But sometimes it may also causes smaller computed branch offset than actual branch offset. If the offset is close to the limit of encoding, it may cause problem at run time. Following is a simplified example. actual estimated address address ... bne Far 100 10c .p2align 4 Near: 110 110 ... Far: 8108 8108 Actual offset: 0x8108 - 0x100 = 0x8008 Computed offset: 0x8108 - 0x10c = 0x7ffc The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual offset. So we add this number to the offset for safety. Differential Revision: https://reviews.llvm.org/D57718 llvm-svn: 355529	2019-03-06 18:22:22 +00:00
Strahinja Petrovic	94fccc93de	[PowerPC] Add secure plt support for TLS symbols This patch supports secure plt mode for TLS symbols. Differential Revision: https://reviews.llvm.org/D45520 llvm-svn: 355513	2019-03-06 15:00:10 +00:00
Chen Zheng	9cfe7e81f1	[PowerPC] fix killed/dead flag after convert x-form to d-form tranformation. Differential Revision: https://reviews.llvm.org/D58428 llvm-svn: 355378	2019-03-05 04:56:54 +00:00
Stefan Pintilie	bd5429ef38	[PowerPC] Move the stack pointer update instruction later in the prologue and earlier in the epilogue. Move the stdu instruction in the prologue and epilogue. This should provide a small performance boost in functions that are able to do this. I've kept this change rather conservative at the moment and functions with frame pointers or base pointers will not try to move the stack pointer update. Differential Revision: https://reviews.llvm.org/D42590 llvm-svn: 355085	2019-02-28 12:23:28 +00:00
Dmitri Gribenko	3d0576bbaf	Fixed a typo in the test s/CEHCK/CHECK/ Summary: Turns out the test was not correct, I had to adjust the test to work. I also added CHECK-LABELs for better error messages from FileCheck while I'm here. Reviewers: jsji Subscribers: nemanjai, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58614 llvm-svn: 355079	2019-02-28 10:56:39 +00:00
Joerg Sonnenberger	6a198366a0	Default to Secure PLT on PPC for NetBSD and OpenBSD. This matches the default settings of clang. llvm-svn: 355038	2019-02-27 21:53:14 +00:00
Dmitri Gribenko	a3a3964f98	Fixed typos in tests: s/CHEKC/CHECK/ Reviewers: ilya-biryukov Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58611 llvm-svn: 354785	2019-02-25 13:41:59 +00:00
Kang Zhang	4faa4090c9	[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts Summary: Fast selection of llvm fptoi & fptrunc instructions is not handled well about VSX instruction support. We'd use VSX float convert integer instruction instead of non-vsx float convert integer instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is openeded. For float trunc instruction, we do this silimar work like float convert integer instruction to try to use VSX instruction. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D58430 llvm-svn: 354762	2019-02-25 02:46:16 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Nirav Dave	43a7600a7f	[PPC] Add store merging testcase. llvm-svn: 354595	2019-02-21 16:34:48 +00:00
Chen Zheng	ffece2dfcf	[PowerPC] exploit P9 instruction maddld. Differential Revision: https://reviews.llvm.org/D58364 llvm-svn: 354427	2019-02-20 02:30:06 +00:00
Stefan Pintilie	1113940df2	[PowerPC][NFC] Added tests for prologue and epilogue code gen. Added four test files to check the existing behaviour of prologue and epilogue code generation. This patch was done as a setup for the upcoming patch listed on Phabricator that will change how the prologue and epilogue work. The upcoming patch is: https://reviews.llvm.org/D42590 llvm-svn: 353994	2019-02-13 23:37:23 +00:00
Sanjay Patel	86fac11d5a	[DAGCombiner] convert logic-of-setcc into bit magic (PR40611) If we're comparing some value for equality against 2 constants and those constants have an absolute difference of just 1 bit, then we can offset and mask off that 1 bit and reduce to a single compare against zero: and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) --> setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq https://rise4fun.com/Alive/XslKj This transform is disabled by default using a TLI hook ("convertSetCCLogicToBitwiseLogic()"). That should be overridden for AArch64, MIPS, Sparc and possibly others based on the asm shown in: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353859	2019-02-12 17:07:47 +00:00
Simon Pilgrim	015cc0f0fa	[PowerPC] Regenerate test llvm-svn: 353851	2019-02-12 16:10:50 +00:00
Sanjay Patel	14fb86310f	[PowerPC] add tests for logic of setcc (PR40611); NFC llvm-svn: 353788	2019-02-12 01:46:26 +00:00
Roland Froese	732fe22454	[PowerPC] Avoid scalarization of vector truncate The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead. Differential Revision: https://reviews.llvm.org/D56507 llvm-svn: 353724	2019-02-11 17:29:14 +00:00
Nemanja Ivanovic	92a8c36735	[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X)) The sqrt case is faster and we already do this for the case where the exponent is 0.25. This adds the 0.75 case which is also not sensitive to signed zeros. Patch by Whitney Tsang (Whitney) Differential revision: https://reviews.llvm.org/D57434 llvm-svn: 353557	2019-02-08 19:50:58 +00:00
Sanjay Patel	2d4b186844	[DAGCombiner] fold add/sub with bool operand based on target's boolean contents I noticed that we are missing this canonicalization in IR: rL352515 ...and then realized that we don't get this right in SDAG either, so this has to be fixed first regardless of what we choose to do in IR. The existing fold was limited to scalars and using the wrong predicate to guard the transform. We have a boolean contents TLI query that can be used to decide which direction to fold. This may eventually lead back to the problems/question in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but it makes no difference to that yet. Differential Revision: https://reviews.llvm.org/D57401 llvm-svn: 353433	2019-02-07 17:43:34 +00:00
Roland Froese	42f58498c5	[PowerPC] Add vector truncate test to prep for D56507 NFC llvm-svn: 353344	2019-02-06 21:34:44 +00:00
Alina Sbirlea	6cba96ed52	[LICM/MSSA] Add promotion to scalars by building an AliasSetTracker with MemorySSA. Summary: Experimentally we found that promotion to scalars carries less benefits than sinking and hoisting in LICM. When using MemorySSA, we build an AliasSetTracker on demand in order to reuse the current infrastructure. We only build it if less than AccessCapForMSSAPromotion exist in the loop, a cap that is by default set to 250. This value ensures there are no runtime regressions, and there are small compile time gains for pathological cases. A much lower value (20) was found to yield a single regression in the llvm-test-suite and much higher benefits for compile times. Conservatively we set the current cap to a high value, but we will explore lowering it when MemorySSA is enabled by default. Reviewers: sanjoy, chandlerc Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D56625 llvm-svn: 353339	2019-02-06 20:25:17 +00:00
Sanjay Patel	f1314b6603	[PowerPC] adjust test for uaddo change in rL353001 We don't need a mtctr/bctr for this test now; a regular conditional branch is fine. llvm-svn: 353002	2019-02-03 18:10:16 +00:00
Sanjay Patel	84ceae6048	[CGP] adjust target constraints for forming uaddo There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001	2019-02-03 17:53:09 +00:00
Sanjay Patel	359a97310b	[PowerPC] add tests for saturating add; NFC This is copied from the existing test files for x86/AArch. llvm-svn: 352987	2019-02-03 12:42:54 +00:00
Chen Zheng	be589423d8	[PowerPC] delete no more needed workaround for readsRegister() in PowerPC Differential Revision: https://reviews.llvm.org/D57439 llvm-svn: 352689	2019-01-30 23:18:38 +00:00
Chen Zheng	ca26039cc7	[PowerPC] more opportunity for converting reg+reg to reg+imm Differential Revision: https://reviews.llvm.org/D57314 llvm-svn: 352583	2019-01-30 01:57:01 +00:00
James Y Knight	5d71fc5d7b	Adjust documentation for git migration. This fixes most references to the paths: llvm.org/svn/ llvm.org/git/ llvm.org/viewvc/ github.com/llvm-mirror/ github.com/llvm-project/ reviews.llvm.org/diffusion/ to instead point to https://github.com/llvm/llvm-project. This is not a trivial substitution, because additionally, all the checkout instructions had to be migrated to instruct users on how to use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of checking out various projects into various subdirectories. I've attempted to not change any scripts here, only documentation. The scripts will have to be addressed separately. Additionally, I've deleted one document which appeared to be outdated and unneeded: lldb/docs/building-with-debug-llvm.txt Differential Revision: https://reviews.llvm.org/D57330 llvm-svn: 352514	2019-01-29 16:37:27 +00:00
Guozhi Wei	81f3fd4bf8	[MBP] Don't move bottom block before header if it can't reduce taken branches If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case: -->OldTop<- \| . \| \| . \| \| . \| ---Pred \| \| \| BB----- Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving. Differential Revision: https://reviews.llvm.org/D57067 llvm-svn: 352236	2019-01-25 19:45:13 +00:00
Zi Xuan Wu	308a609c6e	[PowerPC] Enhance the fast selection of cmp instruction and clean up related asserts Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support. We'd use VSX float comparison instruction instead of non-vsx float comparison instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is opened. If the target does not have corresponding VSX instruction comparison for some type, just copy VSX-related register to common float register class and use non-vsx comparison instruction. Differential Revision: https://reviews.llvm.org/D57078 llvm-svn: 352174	2019-01-25 07:24:59 +00:00
Nemanja Ivanovic	b9b75de0ae	[PowerPC] Exploit store instructions that store a single vector element This patch exploits the instructions that store a single element from a vector to preform a (store (extract_elt)). We already have code that does this with ISA 3.0 instructions that were added to handle i8/i16 types. However, we had never exploited the existing ones that handle f32/f64/i32/i64 types. Differential revision: https://reviews.llvm.org/D56175 llvm-svn: 352131	2019-01-24 23:44:28 +00:00
James Y Knight	693d39dd12	Remove irrelevant references to legacy git repositories from compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200	2019-01-15 16:18:52 +00:00
Francis Visoiu Mistrih	b7cef81fd3	Replace "no-frame-pointer-" function attributes with "frame-pointer" Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer" Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc. "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer". tests are mostly updated with // replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' \| xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g" // replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * \| xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g" Patch by Yuanfang Chen (tabloid.adroit)! Differential Revision: https://reviews.llvm.org/D56351 llvm-svn: 351049	2019-01-14 10:55:55 +00:00
Zi Xuan Wu	64c956eea8	Recommit "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel" This re-commit r350685. Differential Revision: https://reviews.llvm.org/D55686 llvm-svn: 350799	2019-01-10 06:20:14 +00:00
Zi Xuan Wu	f2a75eef41	Revert "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel" This reverts commit r350685. See compile assert in compiler-rt. llvm-svn: 350693	2019-01-09 06:12:24 +00:00
Zi Xuan Wu	9479f6d72e	[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel Bad machine code: Illegal virtual register for instruction function: TestULE basic block: %bb.0 entry (0x1000a39b158) instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc operand 1: %1:vsfrc Fix assert about missing match between fcmp instruction and register class. We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened. add -verifymachineinstrs option into related test cases to enable the verify pass. Differential Revision: https://reviews.llvm.org/D55686 llvm-svn: 350685	2019-01-09 02:31:10 +00:00
QingShan Zhang	f24ec7bdd0	[Power9] Enable the Out-of-Order scheduling model for P9 hw When switched to the MI scheduler for P9, the hardware is modeled as out of order. However, inside the MI Scheduler algorithm, we still use the in-order scheduling model as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer the op. So, only when all the available instructions issued, the pending instruction could be scheduled. That is not true for our P9 hw in fact. This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017. With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows: x264_r: +6.95% cactuBSSN_r: +6.94% lbm_r: +4.11% xz_r: -3.85% And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. Reviewer: Nemanjai Differential Revision: https://reviews.llvm.org/D55810 llvm-svn: 350285	2019-01-03 05:04:18 +00:00
Craig Topper	c562fae02b	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235	2019-01-02 17:58:27 +00:00
Wei Mi	ecc89b76cb	[PowerPC] Remove SeenUse check when optimizing conditional branch in PPCPreEmitPeephole pass. PPCPreEmitPeephole will convert a BC to B when the conditional branch is based on a constant CR by CRSET or CRUNSET. This is added in https://reviews.llvm.org/rL343100. When the conditional branch is known to be always taken, all branches will be removed and a new unconditional branch will be inserted. However, when SeenUse is false the original patch will not remove the branches, but still insert the new unconditional branch, update the successors and create inconsistent IR. Compiling the synthetic testcase included can show the problem we run into. The patch simply removes the SeenUse condition when adding branches into InstrsToErase set. Differential Revision: https://reviews.llvm.org/D56041 llvm-svn: 350223	2019-01-02 17:07:23 +00:00
Kang Zhang	9d78c60bf4	[PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code Summary: For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo. Then the verifier runs and it seems like we have a use of an undefined register (the register will be reserved later, but the verifier doesn't know that). So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know X2 is a reserved register. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D56148 llvm-svn: 350165	2018-12-30 15:13:51 +00:00
Kang Zhang	4aa6453767	[PowerPC] Fix ADDE, SUBE do not know how to promote operator Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161	2018-12-30 07:48:09 +00:00
Nemanja Ivanovic	0f7715afe1	[PowerPC] Complete the custom legalization of vector int to fp conversion A recent patch has added custom legalization of vector conversions of v2i16 -> v2f64. This just rounds it out for other types where the input vector has an illegal (narrower) type than the result vector. Specifically, this will handle the following conversions: v2i8 -> v2f64 v4i8 -> v4f32 v4i16 -> v4f32 Differential revision: https://reviews.llvm.org/D54663 llvm-svn: 350155	2018-12-29 13:40:48 +00:00
Nemanja Ivanovic	3c7ac649ec	[PowerPC] Fix CR Bit spill pseudo expansion The current CRBIT spill pseudo-op expansion creates a KILL instruction that kills the CRBIT and defines the enclosing CR field. However, this paints a false picture to the register allocator that all bits in the CR field are killed so copies of other bits out of the field become dead and removable. This changes the expansion to preserve the KILL flag on the CRBIT as an implicit use and to treat the CR field as an undef input. Thanks to Hal Finkel for the review and Uli Weigand for implementation input. Differential revision: https://reviews.llvm.org/D55996 llvm-svn: 350153	2018-12-29 11:43:54 +00:00
Hiroshi Inoue	1ea98f040e	[PowerPC] handle ISD:TRUNCATE in BitPermutationSelector This is the last one in a series of patches to support better code generation for bitfield insert. BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE. This patch adds support for ISD:TRUNCATE in BitPermutationSelector. For example of this test case, struct s64b { int a:4; int b:16; int c:24; }; void bitfieldinsert64b(struct s64b *p, unsigned char v) { p->b = v; } the selection DAG loos like: t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64 t18: i32 = and t14, Constant:i32<-1048561> t4: i64,ch = CopyFromReg t0, Register:i64 %1 t22: i64 = AssertZext t4, ValueType:ch:i8 t23: i32 = truncate t22 t16: i32 = shl nuw nsw t23, Constant:i32<4> t19: i32 = or t18, t16 t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64 By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18. So the generated sequences with and without this patch are without this patch rlwinm 5, 5, 0, 28, 11 # corresponding to t18 rlwimi 5, 4, 4, 20, 27 with this patch rlwimi 5, 4, 4, 12, 27 Differential Revision: https://reviews.llvm.org/D49076 llvm-svn: 350118	2018-12-28 08:00:39 +00:00
QingShan Zhang	f2d9df61c7	[PowerPC] Remove the implicit use of the register if it is replaced by Imm If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have. Differential Revision: https://reviews.llvm.org/D56078 llvm-svn: 350115	2018-12-28 03:38:09 +00:00
Zi Xuan Wu	a02a3feecf	[PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class For atomic value operand which less than 4 bytes need to be masked. And the related operation to calculate the newvalue can be done in 32 bit gprc. So just use gprc for mask and value calculation. Differential Revision: https://reviews.llvm.org/D56077 llvm-svn: 350113	2018-12-28 02:12:55 +00:00
Chen Zheng	5ede950df9	[PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction Differential Revision: https://reviews.llvm.org/D55806 llvm-svn: 350111	2018-12-28 01:02:35 +00:00
Kang Zhang	d501a1e596	[PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue Summary: This patch is to fix the bug imported by rL341634. In above submit , the the return type of ISD::ADDE is 14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), but in fact, the second return type of ISD::ADDE should be MVT::Glue not MVT::i64. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D55977 llvm-svn: 350061	2018-12-25 03:29:51 +00:00
Clement Courbet	36a3480385	Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads. Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change. llvm-svn: 349747	2018-12-20 13:01:04 +00:00
Kang Zhang	ca8db48974	[PowerPC] Implement the isSelectSupported() target hook Summary: PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC does not have vector CR selects, PowerPC does not support scalar condition selects on vectors. In addition to implementing this hook, isSelectSupported() should return false when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects are converted into branch sequences. Reviewed By: steven.zhang, hfinkel Differential Revision: https://reviews.llvm.org/D55754 llvm-svn: 349727	2018-12-20 06:19:59 +00:00
Amy Kwan	22cd453ba7	Test commit llvm-svn: 349633	2018-12-19 15:21:07 +00:00
Kewen Lin	a6247e7cf4	[PowerPC]Exploit P9 vabsdu for unsigned vselect patterns For type v4i32/v8ii16/v16i8, do following transforms: (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b) (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b) (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b) Differential Revision: https://reviews.llvm.org/D55812 llvm-svn: 349599	2018-12-19 03:04:07 +00:00
Kewen Lin	bbb461f758	[PowerPC][NFC]Update vabsd cases with vselect test cases Power9 VABSDU* instructions can be exploited for some special vselect sequences. Check in the orignal test case here, later the exploitation patch will update this and reviewers can check the differences easily. llvm-svn: 349446	2018-12-18 08:11:32 +00:00
Kewen Lin	44ace92596	[PowerPC] Exploit power9 new instruction setb Check the expected pattens feeding to SELECT_CC like: (select_cc lhs, rhs, 1, (sext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, -1, (zext (setcc [lr]hs, [lr]hs, cc2)), cc1) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, 1, -1, cc2), seteq) (select_cc lhs, rhs, 0, (select_cc [lr]hs, [lr]hs, -1, 1, cc2), seteq) Further transform the sequence to comparison + setb if hits. Differential Revision: https://reviews.llvm.org/D53275 llvm-svn: 349445	2018-12-18 07:53:26 +00:00
QingShan Zhang	ecdab5bdd8	[NFC] Add new test to cover the lhs scheduling issue for P9. llvm-svn: 349443	2018-12-18 06:32:42 +00:00
QingShan Zhang	f549812599	[NFC] fix test case issue that with wrong label check. llvm-svn: 349439	2018-12-18 04:25:41 +00:00
Kewen Lin	3dac1252da	[PowerPC] Improve vec_abs on P9 Improve the current vec_abs support on P9, generate ISD::ABS node for vector types, combine ABS node to VABSD node for some special cases to make use of P9 VABSD* insns, do custom lowering to vsub(vneg later)+vmax if it has no combination opportunity. Differential Revision: https://reviews.llvm.org/D54783 llvm-svn: 349437	2018-12-18 03:16:43 +00:00
Kewen Lin	c68ce89ae1	[Power9][NFC]update vabsd case for better dumping Appended options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to get the more descriptive output. Also removed useless function attributes. llvm-svn: 349329	2018-12-17 06:32:02 +00:00
Kewen Lin	3ee103085e	[Power9][NFC]Make pre-inc-disable case more robust With some patch adopted for Power9 vabsd* insns, some CHECKs can't get the expected results. But it's false alarm, we should update the case more robust. llvm-svn: 349325	2018-12-17 03:16:12 +00:00
Kewen Lin	3ac031bb8f	[Power9][NFC] add setb exploitation test case Add an original test case for setb before the exploitation actually takes effect, later we can check the difference. Differential Revision: https://reviews.llvm.org/D55696 llvm-svn: 349251	2018-12-15 04:39:37 +00:00
Chen Zheng	cdbd5bef6d	[NFC][PowerPC] add verify-machineinstrs check After rL349029 and rL348566, sj-ctr-loop.ll is ok for verify-machineinstrs check. llvm-svn: 349030	2018-12-13 12:55:42 +00:00
Chen Zheng	9c6fa536e0	[PowerPC] intrinsic llvm.eh.sjlj.setjmp should not have flag isBarrier. Differential Revision: https://reviews.llvm.org/D55499 llvm-svn: 349029	2018-12-13 12:25:20 +00:00
Clement Courbet	76f4ae1092	[CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 349016	2018-12-13 09:56:19 +00:00
Clement Courbet	8b6434bbb9	Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores." Breaks ARM/memcpy-inline.ll llvm-svn: 348844	2018-12-11 13:38:43 +00:00
Clement Courbet	93b3445770	[CodeGen] Allow mempcy/memset to generate small overlapping stores. Summary: All targets either just return false here or properly model `Fast`, so I don't think there is any reason to prevent CodeGen from doing the right thing here. Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D55365 llvm-svn: 348843	2018-12-11 13:15:56 +00:00
David Green	bd72be0b44	[Targets] Fixup incorrect targets in codemodel tests llvm-svn: 348796	2018-12-10 20:55:34 +00:00
David Green	ca29c271d2	[Targets] Add errors for tiny and kernel codemodel on targets that don't support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 llvm-svn: 348585	2018-12-07 12:10:23 +00:00
Zi Xuan Wu	cf4d477b0b	[PowerPC] Fix assert from machine verify pass that missing undef register flag Fix assert about using an undefined physical register in machine instruction verify pass. The reason is that register flag undef is missing when doing transformation from If Conversion Pass. ``` Bad machine code: Using an undefined physical register - function: func_65 - basic block: %bb.0 entry (0x10024740738) - instruction: BCLR killed $cr5lt, implicit $lr8, implicit $rm, implicit undef $x3 - operand 0: killed $cr5lt LLVM ERROR: Found 1 machine code errors. ``` There are also other existing testcases with same issue. So I add -verify-machineinstrs option to open verifying. Differential Revision: https://reviews.llvm.org/D55408 llvm-svn: 348566	2018-12-07 05:25:16 +00:00
Sanjay Patel	c6441c8547	[DAGCombiner] use root SDLoc for all nodes created by logic fold If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552	2018-12-07 00:01:57 +00:00
Sanjay Patel	bfc7ffa40f	[DAGCombiner] don't hoist logic op if operands have other uses, part 2 The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral. llvm-svn: 348518	2018-12-06 19:18:56 +00:00
Sanjay Patel	273b778997	[PowerPC] add tests for hoisting bitwise logic; NFC llvm-svn: 348516	2018-12-06 19:05:19 +00:00
Stefan Pintilie	46f840f286	[PowerPC] Make no-PIC default to match GCC - LLVM Change the default for PowerPC LE to -fno-PIC. Differential Revision: https://reviews.llvm.org/D53383 llvm-svn: 348298	2018-12-04 20:14:57 +00:00
Kang Zhang	51986417f9	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 348109	2018-12-03 03:32:57 +00:00
Li Jia He	bcae407a3c	[PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection Summary: A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative so for signed comparisons vs. unsigned ones, the input operands just need to be swapped. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54825 llvm-svn: 347831	2018-11-29 03:04:39 +00:00
Li Jia He	339af52804	[PowerPC] [NFC] Add test cases to the ISD::BR_CC node in the instruction selection Add the following test case for the ISD::BR_CC node in the instruction selection define i64 @testi64slt(i64 %c1, i64 %c2, i64 %c3, i64 %c4, i64 %a1, i64 %a2) #0 { entry: %cmp1 = icmp eq i64 %c3, %c4 %cmp3tmp = icmp eq i64 %c1, %c2 %cmp3 = icmp slt i1 %cmp3tmp, %cmp1 br i1 %cmp3, label %iftrue, label %iffalse iftrue: ret i64 %a1 iffalse: ret i64 %a2 } The data type i64 can be replaced by i32, i64, float, double  And condition codes can be replaced by: SETEQ, SETEN, SELT, SETLE, SETGT, SETGE,SETULT, SETULE, SSETGT, and SETUGE Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54824 llvm-svn: 347828	2018-11-29 02:51:03 +00:00
Craig Topper	b955bf382c	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Kang Zhang	840e98f9f1	Revert "[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction" This reverts commits r347532. Forget add the option -mtriple powerpc64-unknown-linux-gnu. So other platform is error except for PowerPC. llvm-svn: 347534	2018-11-26 07:15:31 +00:00
Kang Zhang	e98d4f511c	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 347532	2018-11-26 06:03:25 +00:00
Sanjay Patel	3e80019275	[DAGCombiner] form 'not' ops ahead of shifts (PR39657) We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478	2018-11-22 19:24:10 +00:00
Nemanja Ivanovic	5cf902ccd4	[PowerPC] Do not use vectors to codegen bswap with Altivec turned off We have efficient codegen on P9 for lowering bswap that involves moving the value into a vector reg and moving it back. However, the check under which we custom lowered it did not adequately reflect the actual requirements. It required only that the subtarget be an implementation of ISA 3.0 since all compliant implementations have to provide the vector instructions. However, the kernel builds have a valid use case for -mno-altivec -mcpu=pwr9 (i.e. don't emit vector code, don't have to save vector regs for context switch). So we should require the correct features for this lowering. Fixes https://bugs.llvm.org/show_bug.cgi?id=39334 llvm-svn: 347376	2018-11-21 02:53:50 +00:00
Jinsong Ji	9a0ed20072	[PowerPC] Add Itineraries for STWU/STWUX etc When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. This patch is for STWU/STWUX (IIC_LdStStoreUpd ) for P8. Since there are already multiple IIC for store update, this patch also merge IIC_LdStSTDU/IIC_LdStStoreUpd to IIC_LdStSTU IIC_LdStSTDUX to IIC_LdStSTUX and we add a new testcase in https://reviews.llvm.org/D54699 to show the difference. Differential Revision: https://reviews.llvm.org/D54700 llvm-svn: 347311	2018-11-20 15:11:42 +00:00
Jinsong Ji	42c13c22bc	[PowerPC][NFC]Add testcase for STWU scheduling check This patch add a STWU testcase for scheduling check. Currently P7/P8 which use itineraries are missing IIC_LdStStoreUpd, We use CHECK-ITIN prefix to check P7/P8, then use default for P9 (and future). We will fix the missing itineraries of IIC_LdStStoreUpd in following patch, and update this testcase to show the scheduling difference only there. Differential Revision: https://reviews.llvm.org/D54699 llvm-svn: 347310	2018-11-20 14:55:43 +00:00
Nemanja Ivanovic	9b393909e2	[PowerPC] Don't combine to bswap store on 1-byte truncating store Turns out that there was no check for a store that truncates down to a single byte when combining a (store (bswap...)) into a byte-swapping store. This patch just adds that check. Fixes https://bugs.llvm.org/show_bug.cgi?id=39478. llvm-svn: 347288	2018-11-20 04:42:31 +00:00
Nemanja Ivanovic	ed6159bb71	[PowerPC][NFC] Add tests for vector fp <-> int conversions This NFC patch just adds test cases for conversions that currently require scalarization of vectors. An updcoming patch will change the legalization for these and it is more suitable on the review to show the diferences in code gen rather than just the new code gen. llvm-svn: 347090	2018-11-16 20:24:10 +00:00
Stefan Pintilie	9004444d81	Revert "[PowerPC] Make no-PIC default to match GCC - LLVM" This reverts commit r347069 llvm-svn: 347076	2018-11-16 19:24:23 +00:00
Stefan Pintilie	046eff502f	[PowerPC] Make no-PIC default to match GCC - LLVM Set -fno-PIC as the default option. Differential Revision: https://reviews.llvm.org/D53383 llvm-svn: 347069	2018-11-16 18:36:21 +00:00
Zi Xuan Wu	6a3c279d1c	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding, which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel. Differential Revision: https://reviews.llvm.org/D49531 llvm-svn: 346824	2018-11-14 02:34:45 +00:00
Zaara Syeda	7509880b54	[Power9] Add support for stxvw4x.be and stxvd2x.be intrinsics On Power9, we don't have patterns to select the following intrinsics: llvm.ppc.vsx.stxvw4x.be llvm.ppc.vsx.stxvd2x.be This patch adds support for these. Differential Revision: https://reviews.llvm.org/D53581 llvm-svn: 346148	2018-11-05 17:31:26 +00:00
Li Jia He	03170a904f	[PowerPC] Support constraint 'wi' in asm From the gcc manual, we can see that the specific limit of wi inline asm is “FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS”. The link is https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Machine-Constraints.html#Machine-Constraints. We should accept this constraint. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D53265 llvm-svn: 345810	2018-11-01 02:35:17 +00:00
Matthias Braun	a83403892a	MachineOperand/MIParser: Do not print debug-use flag, infer it The debug-use flag must be set exactly for uses on DBG_VALUEs. This is so obvious that it can be trivially inferred while parsing. This will reduce noise when printing while omitting an information that has little value to the user. The parser will keep recognizing the flag for compatibility with old `.mir` files. Differential Revision: https://reviews.llvm.org/D53903 llvm-svn: 345671	2018-10-30 23:28:27 +00:00
Matthias Braun	c045c557b0	Relax fast register allocator related test cases; NFC - Relex hard coded registers and stack frame sizes - Some test cleanups - Change phi-dbg.ll to match on mir output after phi elimination instead of going through the whole codegen pipeline. This is in preparation for https://reviews.llvm.org/D52010 I'm committing all the test changes upfront that work before and after independently. llvm-svn: 345532	2018-10-29 20:10:42 +00:00
Lei Huang	de20843f6f	[PowerPC] Improve BUILD_VECTOR of 4 i32s Currently, for this node: vector int test(int a, int b, int c, int d) { return (vector int) { a, b, c, d }; } we get this on Power9: mtvsrdd 34, 5, 3 mtvsrdd 35, 6, 4 vmrgow 2, 3, 2 and this on Power8: mtvsrwz 0, 3 mtvsrwz 1, 5 mtvsrwz 2, 4 mtvsrwz 3, 6 xxmrghd 34, 1, 0 xxmrghd 35, 3, 2 vmrgow 2, 3, 2 This can be improved to this on LE Power9: rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrdd 34, 5, 3 and this on LE Power8 rldimi 3, 4, 32, 0 rldimi 5, 6, 32, 0 mtvsrd 34, 3 mtvsrd 35, 5 xxpermdi 34, 35, 34, 0 This patch updates the TD pattern to generate the optimized sequence for both Power8 and Power9 on LE and BE. Differential Revision: https://reviews.llvm.org/D53494 llvm-svn: 345414	2018-10-26 18:09:36 +00:00
Li Jia He	f6fb752fe8	[PowerPC] Fix some missed optimization opportunities in combineSetCC For both operands are bool, short, int, long, long long, add the following optimization. 1. 0-x == y --> x+y ==0 2. 0-x != y --> x+y != 0 Review: nemanjai Differential Revision: https://reviews.llvm.org/D53360 llvm-svn: 345366	2018-10-26 06:48:53 +00:00
Li Jia He	9521467318	[PowerPC][NFC] Add tests for some missed optimization opportunities in combineSetCC For both operands are bool, short, int, long, long long, add the following optimization test case. 1. 0-x == y --> x+y ==0 2. 0-x != y --> x+y != 0 Review: nemanjai Differential Revision: https://reviews.llvm.org/D53358 llvm-svn: 345365	2018-10-26 05:02:10 +00:00
Nemanja Ivanovic	6a74bfba20	[PowerPC] Keep vector int to fp conversions in vector domain At present a v2i16 -> v2f64 convert is implemented by extracts to scalar, scalar converts, and merge back into a vector. Use vector converts instead, with the int data permuted into the proper position and extended if necessary. Patch by RolandF. Differential revision: https://reviews.llvm.org/D53346 llvm-svn: 345361	2018-10-26 03:19:13 +00:00
Stefan Pintilie	927e8bf316	[Power9] Add __float128 support in the backend for bitcast to a i128 Add support to allow bit-casting from f128 to i128 and then extracting 64 bits from the result. Differential Revision: https://reviews.llvm.org/D49507 llvm-svn: 345053	2018-10-23 17:11:36 +00:00
Nemanja Ivanovic	674581afbb	[PowerPC][NFC] Fix bugs in r+r to r+i conversion The D-Form VSX loads introduced in ISA 3.0 are not direct D-Form equivalent of the corresponding X-Forms since they only target the Altivec registers. Namely LXSSPX can load into any of the 64 VSX registers whereas LXSSP can only load into the upper 32 VSX registers. Similarly with the remaining affected instructions. There is currently no way that I can see to trigger the bug, but as we add other ways of exploiting these instructions, there may very well be instances that do. This is an NFC patch in practical terms since the changes it introduces can not be triggered without an MIR test. Differential revision: https://reviews.llvm.org/D53323 llvm-svn: 344894	2018-10-22 11:22:59 +00:00
Hiroshi Inoue	9552dd187a	[PowerPC] avoid masking already-zero bits in BitPermutationSelector The current BitPermutationSelector generates a code to build a value by tracking two types of bits: ConstZero and Variable. ConstZero means a bit we need to mask off and Variable is a bit we copy from an input value. This patch add third type of bits VariableKnownToBeZero caused by AssertZext node or zero-extending load node. VariableKnownToBeZero means a bit comes from an input value, but it is known to be already zero. So we do not need to mask them. VariableKnownToBeZero enhances flexibility to group bits, since we can avoid redundant masking for these bits. This patch also renames "HasZero" to "NeedMask" since now we may skip masking even when we have zeros (of type VariableKnownToBeZero). Differential Revision: https://reviews.llvm.org/D48025 llvm-svn: 344347	2018-10-12 14:02:20 +00:00
Nirav Dave	f1f2a2a31a	[DAG] Fix Big Endian in Load-Store forwarding Summary: Correct offset calculation in load-store forwarding for big-endian targets. Reviewers: rnk, RKSimon, waltl Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53147 llvm-svn: 344272	2018-10-11 18:28:59 +00:00
Nirav Dave	07acc992dc	[DAGCombine] Improve Load-Store Forwarding Summary: Extend analysis forwarding loads from preceeding stores to work with extended loads and truncated stores to the same address so long as the load is fully subsumed by the store. Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are deleted as they've no longer seem to be relevant. Reviewers: RKSimon, rnk, kparzysz, javed.absar Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D49200 llvm-svn: 344142	2018-10-10 14:15:52 +00:00
Nemanja Ivanovic	ec527dacca	[PowerPC][NFC] Add a test case for extract and store patterns An upcoming patch will change the codegen for these patterns. This test case is added now so that the patch can show the differences in codegen. llvm-svn: 344112	2018-10-10 04:18:35 +00:00
QingShan Zhang	bc1586352e	[PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8 For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52449 llvm-svn: 344109	2018-10-10 02:33:48 +00:00
Nemanja Ivanovic	72d4866e57	[DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093	2018-10-09 23:20:11 +00:00
Nemanja Ivanovic	c62dfe512e	[PowerPC][NFC] Commit nabs test case in preparation for committing D44548 This just adds the test case so that the different code gen is clearly visible when the DAG Combine lands. llvm-svn: 344091	2018-10-09 23:02:53 +00:00
Nemanja Ivanovic	87873d04c3	[PowerPC] Implement hasBitPreservingFPLogic for types that can be supported This is the PPC-specific non-controversial part of https://reviews.llvm.org/D44548 that simply enables this combine for PPC since PPC has these instructions. This commit will allow the target-independent portion to be truly target independent. llvm-svn: 344077	2018-10-09 20:35:15 +00:00
Nemanja Ivanovic	c8fe772ac4	Fix buildbot failures with the newly added test case (triple was missing). llvm-svn: 344037	2018-10-09 11:17:47 +00:00
Nemanja Ivanovic	4c0b110e3e	[PowerPC] Remove self-copies in pre-emit peephole There are occasionally instances where AADB rewrites registers in such a way that a reg-reg copy becomes a self-copy. Such an instruction is obviously redundant and can be removed. This patch does precisely that. Note that this will not remove various nop's that we insert (which are themselves just self-copies). The reason those are left alone is that all of them have their own opcodes (that just encode to a self-copy). What prompted this patch is the fact that these self-copies sometimes end up using registers that make the instruction a priority-setting nop, thereby having a significant effect on performance. Differential revision: https://reviews.llvm.org/D52432 llvm-svn: 344036	2018-10-09 10:54:04 +00:00
Stefan Pintilie	5d32a86f44	[PowerPC] Folding XForm to DForm loads requires alignment for some DForm loads. Going from XForm Load to DSForm Load requires that the immediate be 4 byte aligned. If we are not aligned we must leave the load as LDX (XForm). This bug is causing a compile-time failure in the benchmark h264ref. Differential Revision: https://reviews.llvm.org/D51988 llvm-svn: 343525	2018-10-01 20:16:27 +00:00
Bjorn Pettersson	4af7f57bdf	[PHIElimination] Update the regression test for PR16508 Summary: When PR16508 was solved (in rL185363) a regression test was added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll. I discovered that the test case no longer reproduced the scenario from PR16508. This problem could have been amended by adding an extra RUN line with "-O1" (or possibly "-O0"), but instead I added a mir-reproducer test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir to get a reproducer that is less sensitive to changes in earlier passes (including O-level). While being at it I also corrected a code comment in PHIElimination::EliminatePHINodes that has been incorrect since the related bugfix from rL185363. Reviewers: MatzeB, hfinkel Reviewed By: MatzeB Subscribers: nemanjai, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52553 llvm-svn: 343416	2018-09-30 17:23:21 +00:00
Hiroshi Inoue	20982f0995	[PowerPC] optimize conditional branch on CRSET/CRUNSET This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET. Other optimizers, such as block placement, may generate such code and hence I do this at the very end of the optimization in pre-emit peephole pass. A conditional branch based on a constant is eliminated or converted into unconditional branch. Also CRSET/CRUNSET is eliminated if the condition code register is not used by instruction other than the branch to be optimized. Differential Revision: https://reviews.llvm.org/D52345 llvm-svn: 343100	2018-09-26 12:32:45 +00:00
Stefan Pintilie	b5305771fb	[Power9] [LLVM] Add __float128 exponent GET and SET builtins Added __builtin_vsx_scalar_extract_expq __builtin_vsx_scalar_insert_exp_qp Builtins should behave the same way as in GCC. Differential Revision: https://reviews.llvm.org/D48185 llvm-svn: 342910	2018-09-24 18:14:13 +00:00
Zaara Syeda	edefda48d2	[PowerPC] Support operand modifier 'x' in inline asm gcc uses operand modifier 'x' in inline asm for VSX registers. Without this modifier, instructions which use VSX numbering for their operands are printed as VMX registers. This patch adds support for the operand modifier 'x'. Differential Revision: https://reviews.llvm.org/D52244 llvm-svn: 342882	2018-09-24 14:01:16 +00:00
QingShan Zhang	cae9425a3c	[PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1 Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive. But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert. Patch By: wuzish (Zixuan Wu) Differential Revision: https://reviews.llvm.org/D52072 llvm-svn: 342611	2018-09-20 03:09:15 +00:00
Nemanja Ivanovic	87c31a6113	[PowerPC] Do not emit record-form rotates when record-form andi/andis suffices This is a follow-up to the previous patch that eliminated some of the rotates. With this addition, we will also emit the record-form andis. This patch increases the number of record-form rotates we eliminate by more than 70%. Differential revision: https://reviews.llvm.org/D44897 llvm-svn: 342478	2018-09-18 13:43:16 +00:00
Nemanja Ivanovic	6a39d32e66	[PowerPC] Optimize compares fed by ANDISo Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions. When optimizing compares, we handle the former in order to eliminate the compare instruction but not the latter. This patch just adds the latter to the set of instructions we optimize. The reason these instructions need to be handled separately is that they are not part of the RecFormRel map (since they don't have a non-record-form). The missing "and-immediate-shifted" is just an oversight in the initial implementation. Differential revision: https://reviews.llvm.org/D51353 llvm-svn: 342472	2018-09-18 13:21:58 +00:00
Alexander Kornienko	4d9d76d800	Remove trailing whitespace introduced in r342440. llvm-svn: 342463	2018-09-18 10:53:13 +00:00
QingShan Zhang	f1b0b47b2d	[PowerPC] Add Itineraries of IIC_IntMulHD for P7/P8 When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52040 llvm-svn: 342441	2018-09-18 02:05:18 +00:00
QingShan Zhang	df1a2d8d2b	[PowerPC][NFC] Add a mulld testcase for scheduling check. This patch add a mulld testcase for scheduling check. Patch By: jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D52039 llvm-svn: 342440	2018-09-18 01:59:22 +00:00
Strahinja Petrovic	488fd4e625	[PowerPC] Fix label address calculation for ppc64 This patch fixes calculating address of label for non-pic ppc64. Differential Revision: https://reviews.llvm.org/D50965 llvm-svn: 342368	2018-09-17 11:03:40 +00:00
Lion Yang	c68f78d5d8	[PowerPC] Fix the calling convention for i1 arguments on PPC32 Summary: Integer types smaller than i32 must be extended to i32 by default. The feature "crbits" introduced at r202451 handles i1 as a special case, but it did not extend properly. The caller was, therefore, passing i1 stack arguments by writing 0/1 to the first byte of the 4-byte stack object and callee was reading the first byte for the value. "crbits" is enabled if the optimization level is greater than 1, which is very common in "release builds". Such discrepancies with ABI specification also introduces potential incompatibility with programs or libraries built with other compilers e.g. GCC. Fixes PR38661 Reviewers: hfinkel, cuviper Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D51108 llvm-svn: 342288	2018-09-14 21:26:05 +00:00
QingShan Zhang	abbb894ff5	[PowerPC] Combine ADD to ADDZE On the ppc64le platform, if ir has the following form, define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 { entry: %cmp = icmp ne i64 %z, CONSTANT (-32767 <= CONSTANT <= 32768) %conv1 = zext i1 %cmp to i64 %add = add nsw i64 %conv1, %x ret i64 %add } we can optimize it to the form below. when C == 0 --> addze X, (addic Z, -1)) / add X, (zext(setne Z, C))-- \ when -32768 <= -C <= 32767 && C != 0 --> addze X, (addic (addi Z, -C), -1) Patch By: HLJ2009 (Li Jia He) Differential Revision: https://reviews.llvm.org/D51403 Reviewed By: Nemanjai llvm-svn: 341634	2018-09-07 07:56:05 +00:00
QingShan Zhang	c2b6c547dc	[PowerPC] Add Itineraries of IIC_IntRotateDI for P7/P8 When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch by jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D51506 llvm-svn: 341293	2018-09-03 03:14:29 +00:00
Kit Barton	7c80f98b69	[PPC] Remove Darwin support from POWER backend. This patch issues an error message if Darwin ABI is attempted with the PPC backend. It also cleans up existing test cases, either converting the test to use an alternative triple or removing the test if the coverage is no longer needed. Updated Tests ------------- The majority of test cases were updated to use a different triple that does not include the Darwin ABI. Many tests were also updated to use FileCheck, in place of grep. Deleted Tests ------------- llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test specific functionality of dsymutil using an object file created with an old version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he suggested removing the test. llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a PPC test to a SystemZ test, as the behavior is also reproducible there. All other tests that were deleted were specific to the darwin/ppc ABI and no longer necessary. Phabricator Review: https://reviews.llvm.org/D50988 llvm-svn: 340795	2018-08-28 01:18:29 +00:00
Nemanja Ivanovic	5d06f17b8a	[PowerPC] Revert commit r339779 This commit has caused failures in some internal benchmarks. Temporarily reverting this patch until the issue can be diagnosed and fixed. llvm-svn: 340740	2018-08-27 13:20:42 +00:00
Nemanja Ivanovic	f2588a28a8	[PowerPC] Recommit r340016 after fixing the reported issue The internal benchmark failure reported by Google was due to a missing check for the result type for the sign-extend and shift DAG. This commit adds the check and re-commits the patch. llvm-svn: 340734	2018-08-27 11:20:27 +00:00
Stefan Pintilie	f384606799	[PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar registers for P9 This patch will address using the xscpsgndp instruction to copy floating point scalar registers instead of the xxlor (specifically XXLORf) instruction that is currently used. Additionally, this patch of utilizing xscpsgndp will apply to P9, while pre-P9 will still use xxlor. Patch by amyk Differential Revision: https://reviews.llvm.org/D50004 llvm-svn: 340643	2018-08-24 20:00:24 +00:00
Stefan Pintilie	892fc6b7f2	[Exception Handling] Unwind tables are required for all functions that have an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 llvm-svn: 340641	2018-08-24 19:38:29 +00:00
Stefan Pintilie	bddd7d8b7b	[PowerPC] Change Test Options [NFC] Patch by amyk llvm-svn: 340639	2018-08-24 19:24:20 +00:00
Stefan Pintilie	7cb44f2470	Revert "[Exception Handling] Unwind tables are required for all functions that have an EH personality." This reverts commit rL340614. Previous commit broke some llvm-cfi-verify tests. llvm-svn: 340625	2018-08-24 17:27:35 +00:00
Stefan Pintilie	36f31617d3	[Exception Handling] Unwind tables are required for all functions that have an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 llvm-svn: 340614	2018-08-24 15:51:47 +00:00
Eric Christopher	3dc594c1e6	Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction" due to it causing a compiler crash on valid. This reverts commit r340016, testcase forthcoming. llvm-svn: 340315	2018-08-21 18:35:08 +00:00
QingShan Zhang	f8f9af7ba5	[PowerPC] Add a peephole post RA to transform the inst that fed by add If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e. y = add imm, reg LFDX. 0, y --> LFD imm(reg) Reviewers: Nemanjai Differential Revision: https://reviews.llvm.org/D49007 llvm-svn: 340149	2018-08-20 02:52:55 +00:00
Stefan Pintilie	39869ccf51	[PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads This patch addresses: - Implementation within PPCISelLowering.cpp to check if we should use direct load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector function is used; which will allow us to catch as many cases of the scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into lxsd. - Test cases to exhibit the behaviour of emitting lxsd/lfd. Patch by amyk Differential revision: https://reviews.llvm.org/D49698 llvm-svn: 340037	2018-08-17 15:15:26 +00:00
Nemanja Ivanovic	39751276b0	[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli extend sign and shift immediate instruction. Patch by RolandF. Differential revision: https://reviews.llvm.org/D49879 llvm-svn: 340016	2018-08-17 12:35:44 +00:00
Eli Friedman	73e8a784e6	[SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922	2018-08-16 18:39:39 +00:00
Nemanja Ivanovic	5b9a4f8ee5	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding. Use xxsel to match vselect if vsx is open, or use vsel. In order to do not write many patterns in td file, promote (for vector it's bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into vsel or xxsel. Patch by wuzish Differential revision: https://reviews.llvm.org/D49531 llvm-svn: 339779	2018-08-15 15:30:36 +00:00
Nemanja Ivanovic	8b4bd09e22	[PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 llvm-svn: 339769	2018-08-15 12:58:13 +00:00
Sanjay Patel	15d1501aae	[SelectionDAG] try harder to convert funnel shift to rotate Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359	2018-08-09 17:26:22 +00:00
Zaara Syeda	b2595b988b	[PowerPC] Improve codegen for vector loads using scalar_to_vector This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260	2018-08-08 15:20:43 +00:00
Nemanja Ivanovic	e1a525ed06	[PowerPC] Do not round values prior to converting to integer Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 llvm-svn: 338658	2018-08-02 00:03:22 +00:00
Sanjay Patel	8aac22e06a	[SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type The bug is visible in the constant-folded x86 tests. We can't use the negated shift amount when the type is not power-of-2: https://rise4fun.com/Alive/US1r ...so in that case, use the regular lowering that includes a select to guard against a shift-by-bitwidth. This path is improved by only calculating the modulo shift amount once now. Also, improve the rotate (with power-of-2 size) lowering to use a negate rather than subtract from bitwidth. This improves the codegen whether we have a rotate instruction or not (although we can still see that we're not matching to a legal rotate in all cases). llvm-svn: 338592	2018-08-01 17:17:08 +00:00
Sanjay Patel	9f807f44b1	[DAGCombiner] transform sub-of-shifted-signbit to add This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317	2018-07-30 22:21:37 +00:00
Sanjay Patel	06c7d5aef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150	2018-07-27 18:31:21 +00:00
Sanjay Patel	efac39eef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC llvm-svn: 338143	2018-07-27 18:12:29 +00:00
Sanjay Patel	c7abb416dc	[DAGCombiner] fold 'not' with signbit math This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132	2018-07-27 16:42:55 +00:00
Sanjay Patel	60c04b961e	[PowerPC] add more tests for signbit math; NFC llvm-svn: 338130	2018-07-27 16:22:18 +00:00
Sanjay Patel	215dcbf4db	[SelectionDAG] try to convert funnel shift directly to rotate if legal If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966	2018-07-25 21:38:30 +00:00
Sanjay Patel	f94c4c84e6	[AArch, PowerPC] add more tests for legal rotate ops; NFC llvm-svn: 337964	2018-07-25 21:25:50 +00:00
Stefan Pintilie	0c122d5a41	[Power9] Code Cleanup - Remove needsAggressiveScheduling() As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488	2018-07-19 19:34:18 +00:00
Justin Hibbits	d52990c71b	Introduce codegen for the Signal Processing Engine Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347	2018-07-18 04:25:10 +00:00
Sanjay Patel	c71adc8040	[Intrinsics] define funnel shift IR intrinsics + DAG builder support As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221	2018-07-16 22:59:31 +00:00
Sanjay Patel	a41c886c55	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X) This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130	2018-07-15 16:27:07 +00:00
Nemanja Ivanovic	080c35050e	[PowerPC] Materialize more constants with CR-field set in late peephole Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008	2018-07-13 15:21:03 +00:00
Stefan Pintilie	94259ba13a	[PowerPC] [NFC] Update __float128 tests Add the two options -ppc-vsr-nums-as-vr and -ppc-asm-full-reg-names to the __float128 tests. Then modify the tests as required. llvm-svn: 336940	2018-07-12 20:18:57 +00:00
Joel E. Denny	9fa9c9368d	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843	2018-07-11 20:25:49 +00:00
Stefan Pintilie	b9d01aa29e	[Power9] Add remaining __flaot128 builtin support for FMA round to odd Implement this as it is done on GCC: __float128 a, b, c, d; a = __builtin_fmaf128_round_to_odd (b, c, d); // generates xsmaddqpo a = __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsmsubqpo a = - __builtin_fmaf128_round_to_odd (b, c, d); // generates xsnmaddqpo a = - __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsnmsubpqp Differential Revision: https://reviews.llvm.org/D48218 llvm-svn: 336754	2018-07-11 01:42:22 +00:00
Stefan Pintilie	133acb22bb	[Power9] Add __float128 builtins for Rounding Operations Added __float128 support for a number of rounding operations: trunc rint nearbyint round floor ceil Differential Revision: https://reviews.llvm.org/D48415 llvm-svn: 336601	2018-07-09 20:38:40 +00:00
Stefan Pintilie	58e3e0a827	[Power9] [LLVM] Add __float128 support for trunc to double round to odd Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48483 llvm-svn: 336595	2018-07-09 20:09:22 +00:00
Stefan Pintilie	83a5fe146e	[Power9] Add __float128 builtins for Round To Odd GCC has builtins for these round to odd instructions: __float128 __builtin_sqrtf128_round_to_odd (__float128) __float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128) __float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128) Differential Revision: https://reviews.llvm.org/D47550 llvm-svn: 336578	2018-07-09 18:50:06 +00:00
Stefan Pintilie	3d76326d24	[Power9] Add __float128 support for compare operations Added handling for the select f128. Differential Revision: https://reviews.llvm.org/D48294 llvm-svn: 336548	2018-07-09 13:36:14 +00:00
Stefan Pintilie	b351f09c9e	[Power9] Add __float128 library call for frem Power 9 does not have a hardware instruction for frem but we can call fmodf128. Differential Revision: https://reviews.llvm.org/D48552 llvm-svn: 336406	2018-07-06 02:47:02 +00:00
Lei Huang	5612b90694	[Power9] Add lib calls for float128 operations with no equivalent PPC instructions Map the following instructions to the proper float128 lib calls: pow[i], exp[2], log[2\|10], sin, cos, fmin, fmax Differential Revision: https://reviews.llvm.org/D48544 llvm-svn: 336361	2018-07-05 15:21:37 +00:00
Sanjay Patel	bce899ff59	[AArch64, PowerPC, x86] add tests for signbit bit hacks; NFC llvm-svn: 336348	2018-07-05 13:16:46 +00:00
Lei Huang	66e22c21c3	[Power9] Optimize codgen for conversions of int to float128 Optimize code sequences for integer conversion to fp128 when the integer is a result of: * float->int * float->long * double->int * double->long Differential Revision: https://reviews.llvm.org/D48429 llvm-svn: 336316	2018-07-05 07:46:01 +00:00
Lei Huang	9d70afbb31	[Power9][NFC] add back-end tests for passing homogeneous fp128 aggregates by value Tests to verify that we are passing fp128 via VSX registers as per ABI. These are related to clang commit rL336308. Differential Revision: https://reviews.llvm.org/D48310 llvm-svn: 336314	2018-07-05 06:51:38 +00:00
Lei Huang	5bab646c26	[Power9] Add tests for passing float128 in VSX reg for non-homogenous aggregates Add missing testcase for rL336310 llvm-svn: 336313	2018-07-05 06:29:28 +00:00
Lei Huang	d17c39ccaa	[Power9]Legalize and emit code for quad-precision convert from single-precision Legalize and emit code for quad-precision floating point operation conversion of single-precision value to quad-precision. Differential Revision: https://reviews.llvm.org/D47569 llvm-svn: 336307	2018-07-05 04:18:37 +00:00
Lei Huang	a26f3be454	[Power9] Implement float128 parameter passing and return values This patch enable parameter passing and return by value for float128 types. Passing aggregate/union which contain float128 members will be submitted in subsequent patches. Differential Revision: https://reviews.llvm.org/D47552 llvm-svn: 336306	2018-07-05 04:10:15 +00:00
Lei Huang	6270ab6ce4	[Power9]Legalize and emit code for round & convert quad-precision values Legalize and emit code for round & convert float128 to double precision and single precision. Differential Revision: https://reviews.llvm.org/D46997 llvm-svn: 336299	2018-07-04 21:59:16 +00:00
Stefan Pintilie	cb4f0c5c07	[PowerPC] Replace the Post RA List Scheduler with the Machine Scheduler We want to run the Machine Scheduler instead of the List Scheduler after RA. Checked with a performance run on a Power 9 machine with SPEC 2006 and while some benchmarks improved and others degraded the geomean was slightly improved with the Machine Scheduler. Differential Revision: https://reviews.llvm.org/D45265 llvm-svn: 336295	2018-07-04 18:54:25 +00:00
Gabor Buella	da4a966e1c	NFC - Various typo fixes in tests llvm-svn: 336268	2018-07-04 13:28:39 +00:00
QingShan Zhang	3b2aa2b4b4	[PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse. unsigned long long result = 0; unsigned long long foo(char p, unsigned long long n) { for (unsigned long long i = 0; i < n; i++) { unsigned long long x1 = (unsigned long long )(p - 50000 + i); unsigned long long x2 = (unsigned long long )(p - 61024 + i); unsigned long long x3 = (unsigned long long )(p - 62048 + i); unsigned long long x4 = (unsigned long long )(p - 64096 + i); result = x1 * x2 * x3 * x4; } return result; } Patch by jedilyn(Kewen Lin). Differential Revision: https://reviews.llvm.org/D48813 --This line, and those below, will be ignored-- M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp A test/CodeGen/PowerPC/preincprep-i64-check.ll llvm-svn: 336074	2018-07-02 05:46:09 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Sanjay Patel	962ee178fa	[DAGCombiner] eliminate setcc bool math when input is low-bit of some value This patch has the same motivating example as D48466: define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) { %c.0282 = and i32 %c.0282.in, 268435455 %a16 = lshr i64 32508, %x %a17 = and i64 %a16, 1 %tobool = icmp eq i64 %a17, 0 %. = select i1 %tobool, i32 1, i32 2 %.286 = select i1 %tobool, i32 27, i32 26 %shr97 = lshr i32 %c.0282, %. %shl98 = shl i32 %c.0282.in, %.286 %or99 = or i32 %shr97, %shl98 %shr100 = lshr i32 %d.0280, %. %shl101 = shl i32 %d.0280, %.286 %or102 = or i32 %shr100, %shl101 store i32 %or99, i32* %ptr0 store i32 %or102, i32* %ptr1 ret void } ...but I'm trying to kill the setcc bool math sooner rather than later. By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, we can create a universally good fold because we always eliminate the condition code intermediate value. Here are Alive proofs for these (currently instcombine folds the 'add' variants, but misses the 'sub' patterns): https://rise4fun.com/Alive/Gsyp Name: sub of zext cmp mask %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %z = zext i1 %c to i32 %r = sub i32 C1, %z => %optional_cast = zext i8 %a to i32 %r = add i32 %optional_cast, C1-1 Name: add of zext cmp mask %a = and i32 %x, 1 %c = icmp eq i32 %a, 0 %z = zext i1 %c to i8 %r = add i8 %z, C1 => %optional_cast = trunc i32 %a to i8 %r = sub i8 C1+1, %optional_cast All of the tests look like improvements or neutral to me. But it is possible that x86 test+set+bitop is better than what we now show here. I suspect we could do better by adding another fold for the 'sub' variants. We start with select-of-constant in IR in the larger motivating test, so that's why I included tests with selects. Proofs for those variants: https://rise4fun.com/Alive/Bx1 Name: true const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C2, i64 C1 => %z = zext i8 %a to i64 %r = sub i64 C2, %z Name: false const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C1, i64 C2 => %z = zext i8 %a to i64 %r = add i64 C1, %z Differential Revision: https://reviews.llvm.org/D48466 llvm-svn: 335433	2018-06-24 14:37:30 +00:00
Sanjay Patel	0fe8ea568b	[PowerPC] add more tests for bit hacking opportunities with setcc; NFC Missed cases where the input and output are the same size in rL335390. llvm-svn: 335395	2018-06-22 22:06:33 +00:00
Sanjay Patel	6e505e4388	[PowerPC] add tests for bit hacking opportunities with setcc; NFC We likely gave up on folding some select-of-constants patterns in IR with rL331486, and we need to recover those in the DAG. The tests without select are based on our current DAGCombiner optimizations for select-of-constants. llvm-svn: 335390	2018-06-22 21:16:29 +00:00
Mikael Holmen	42f7bc96dd	[DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property Summary: In some cases, these operands lacked the IsDebug property, which is meant to signal that they should not affect codegen. This patch adds a check for this property in the MachineVerifier and adds it where it was missing. This includes refactorings to use MachineInstrBuilder construction functions instead of manually setting up the intrinsic everywhere. Patch by: JesperAntonsson Reviewers: aprantl, rnk, echristo, javed.absar Reviewed By: aprantl Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D48319 llvm-svn: 335214	2018-06-21 10:03:34 +00:00
Alina Sbirlea	dfd14adeb0	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183	2018-06-20 22:01:04 +00:00
Stanislav Mekhanoshin	20279dc025	Allow binop C1, (select cc, CF, CT) -> select folding Previously this folding was done only if select is a first operand. However, for non-commutative operations constant may go before select. Differential Revision: https://reviews.llvm.org/D48223 llvm-svn: 335167	2018-06-20 20:24:20 +00:00
Strahinja Petrovic	bb2b00bb80	[PowerPC] Fix label address calculation for ppc32 This patch fixes calculating address of label on ppc32 (for -fPIC). Differential Revision: https://reviews.llvm.org/D46582 llvm-svn: 335043	2018-06-19 13:07:40 +00:00
QingShan Zhang	9f0fe9a3f8	If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction when we are loading a floating, and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction. Differential Revision: https://reviews.llvm.org/D47568 Reviewed By: Nemanjai llvm-svn: 335024	2018-06-19 06:54:51 +00:00
Stanislav Mekhanoshin	9347c7b939	Tests for dag combine select (binop) -> select. NFC. Tests will be updated with https://reviews.llvm.org/D48223 llvm-svn: 334987	2018-06-18 21:49:07 +00:00
Michael Berg	8e570c3390	Utilize new SDNode flag functionality to expand current support for fma Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai Reviewed By: rampitec, nhaehnle Subscribers: tpr, nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47918 llvm-svn: 334876	2018-06-16 00:03:06 +00:00
Michael Berg	77b5be7ec6	propagate fast math flags via IR on fma and sub expressions Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode. Reviewers: spatel, arsenm, hfinkel, javed.absar Reviewed By: spatel Subscribers: nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47388 llvm-svn: 334242	2018-06-07 22:49:09 +00:00
Hiroshi Inoue	01ef4c2c64	[PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation. However, enforcing 32-bit instruction sometimes results in redundant generated code. For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF). uint64_t func(uint64_t a, uint64_t b) { return (a & 0xFFFFFFFF) \| (b << 32) ; } To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group. Differential Revision: https://reviews.llvm.org/D47867 llvm-svn: 334195	2018-06-07 13:21:14 +00:00
Michael Berg	cc1c4b6912	guard fsqrt with fmf sub flags Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. It contains only context for fsqrt. Reviewers: spatel, hfinkel, arsenm Reviewed By: spatel Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai Differential Revision: https://reviews.llvm.org/D47749 llvm-svn: 334113	2018-06-06 18:47:55 +00:00
Michael Berg	96925fe0df	guard fneg with fmf sub flags Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. Reviewers: spatel, hfinkel Reviewed By: spatel Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D47389 llvm-svn: 334037	2018-06-05 18:49:47 +00:00
Michael Berg	8f6d6c817d	NFC: adding baseline fneg case for fmf llvm-svn: 334035	2018-06-05 18:12:25 +00:00
Hiroshi Inoue	955655f558	[PowerPC] reduce rotate in BitPermutationSelector BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers. Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation. For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation. This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5. This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first. Differential Revision: https://reviews.llvm.org/D47765 llvm-svn: 334011	2018-06-05 11:58:01 +00:00
Lei Huang	716103f1cd	[PowerPC] Fix the incorrect iterator inside peephole Instruction selection can insert nodes into the underlying list after the root node so iterating will thereby miss it. We should NOT assume that, the root node is the last element in the DAG nodelist. Patch by: steven.zhang (Qing Shan Zhang) Differential Revision: https://reviews.llvm.org/D47437 llvm-svn: 333415	2018-05-29 13:38:56 +00:00
Lei Huang	651be44913	[Power9]Legalize and emit code for HW/Byte vector extract and convert to QP Implemente patterns to extract HWord and Byte vector elements and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46774 llvm-svn: 333377	2018-05-28 16:43:29 +00:00
Lei Huang	f4ec67822f	[PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX The match pattern in the definition of LXSDX is xoaddr, so the Pseudo instruction XFLOADf64 never gets selected. XFLOADf64 expands to LXSDX/LFDX post RA based on the register pressure. To avoid ambiguity, we need to remove the select pattern for LXSDX, same as what was done for LXSD. STXSDX also have the same issue. Patch by Qing Shan Zhang (steven.zhang). Differential Revision: https://reviews.llvm.org/D47178 llvm-svn: 333150	2018-05-24 03:20:28 +00:00
Lei Huang	8b0da65bfb	[Power9]Legalize and emit code for W vector extract and convert to QP Implemente patterns to extract [Un]signed Word vector element and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46536 llvm-svn: 333115	2018-05-23 19:31:54 +00:00
Lei Huang	8990168a45	[Power9]Legalize and emit code for DW vector extract and convert to QP Implemente patterns to extract [Un]signed DWord vector element and convert to quad-precision. Differential Revision: https://reviews.llvm.org/D46333 llvm-svn: 333112	2018-05-23 18:36:51 +00:00
Sanjay Patel	354842abc5	[PowerPC] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332549	2018-05-16 22:48:48 +00:00
Sanjay Patel	8652c53d29	[DAG] propagate FMF for all FPMathOperators This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. It gets us most of the FMF functionality that we want without adding any state bits to the flags. It also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch. It should provide a superset of the functionality from D46563 - the extra tests show propagation and codegen diffs for fcmp, vecreduce, and FP libcalls. The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, so that shouldn't make any difference. Differential Revision: https://reviews.llvm.org/D46854 llvm-svn: 332358	2018-05-15 14:16:24 +00:00
Sanjay Patel	4c8a67a229	[PowerPC] add more tests for FMF propagation; NFC llvm-svn: 332295	2018-05-14 21:17:49 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Lei Huang	e41e3d3237	[Power9]Legalize and emit code for truncate and convert QP to HW and Byte Legalize and emit code for truncate and convert float128 to (un)signed short and (un)signed char. Differential Revision: https://reviews.llvm.org/D46194 llvm-svn: 331797	2018-05-08 18:52:06 +00:00
Lei Huang	6364288dba	[Power9]Legalize and emit code for truncate and convert Quad-Precision to Word Legalize and emit code for: * xscvqpswz : VSX Scalar truncate & Convert Quad-Precision to Signed Word * xscvqpuwz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Word Differential Revision: https://reviews.llvm.org/D45635 llvm-svn: 331790	2018-05-08 18:34:00 +00:00
Lei Huang	c517e95bc6	[Power9]Legalize and emit code for truncate and convert QP to DW Legalize and emit code for: * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword Differential Revision: https://reviews.llvm.org/D45553 llvm-svn: 331787	2018-05-08 18:23:31 +00:00
Lei Huang	c29229a644	[PowerPC] Unify handling for conversion of FP_TO_INT feeding a store Existing DAG combine only handles conversions for FP_TO_SINT: "{f32, f64} x { i32, i16 }" This patch simplifies the code to handle: "{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }" Differential Revision: https://reviews.llvm.org/D46102 llvm-svn: 331778	2018-05-08 17:36:40 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Sanjay Patel	52151885e4	[PowerPC] add more FMF debug output; NFC We can't see all of the problems currently unless we look at debug output when the global 'unsafe' is on. It's a mess. This is another attempt to make sure that D45710 is not making changes unintentionally. llvm-svn: 331476	2018-05-03 18:49:35 +00:00
Sanjay Patel	e7532d2940	[PowerPC] add tests for FMF propagation; NFC I'm choosing PPC out of convenience because it does all of the transforms of interest in these tests by default. There are multiple FMF problems shown in the current checks. D45710 is proposing to fix part of that. llvm-svn: 331471	2018-05-03 17:41:37 +00:00
Nemanja Ivanovic	01e2e79abf	[PowerPC] Implement isMaskAndCmp0FoldingBeneficial Sinking the and closer to a compare against zero is beneficial on PPC as it allows us to emit record-form instructions. In the future, we may expand this to a larger set of operations that feed compares against zero since PPC has lots of record-form instructions. Differential revision: https://reviews.llvm.org/D46060 llvm-svn: 331416	2018-05-02 23:55:23 +00:00
Nemanja Ivanovic	2139e99e47	[PowerPC] No CTR loop if the candidate exiting block is in a different loop The CTR loops pass will insert the decrementing branch instruction in an exiting block for the loop being transformed. However if that block is part of another loop as well (whether a nested loop or with irreducible CFG), it is not valid to use that exiting block. In fact, if the loop hass irreducible CFG, we don't bother analyzing it and we just bail on the transformation. In practice, this doesn't lead to a noticeable reduction in the number of loops transformed by this pass. Fixes https://bugs.llvm.org/show_bug.cgi?id=37229 Differential Revision: https://reviews.llvm.org/D46162 llvm-svn: 331410	2018-05-02 22:56:04 +00:00
Francis Visoiu Mistrih	57fcd3454a	[MIR] Add support for debug metadata for fixed stack objects Debug var, expr and loc were only supported for non-fixed stack objects. This patch adds the following fields to the "fixedStack:" entries, and renames the ones from "stack:" to: * debug-info-variable * debug-info-expression * debug-info-location Differential Revision: https://reviews.llvm.org/D46032 llvm-svn: 330859	2018-04-25 18:58:06 +00:00
Hiroshi Inoue	33486787cb	[PowerPC] fix incorrect vectorization of abs() on POWER9 Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it. For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs. int vpx_satd_c(const int16_t *coeff, int length) { int satd = 0; for (int i = 0; i < length; ++i) satd += abs(coeff[i]); return satd; } This problem causes test failures for libvpx. For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants. Since these instructions are for unsigned integers, we need adjustment for signed integers. For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000). Differential Revision: https://reviews.llvm.org/D45522 llvm-svn: 330497	2018-04-21 09:32:17 +00:00
Sanjay Patel	3d453ad711	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) This was originally committed at rL328921 and reverted at rL329920 to investigate failures in Chrome. This time I've added to the ReleaseNotes to warn users of the potential of exposing UB and let me repeat that here for more exposure: Optimization of floating-point casts is improved. This may cause surprising results for code that is relying on undefined behavior. Code sanitizers can be used to detect affected patterns such as this: int main() { float x = 4294967296.0f; x = (float)((int)x); printf("junk in the ftrunc: %f\n", x); return 0; } $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int' junk in the ftrunc: 0.000000 Original commit message: fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 330437	2018-04-20 15:07:55 +00:00
Lei Huang	829cd8e263	[NFC] test case clean up 1. remove redundant tests 2. update XForm_tests to generated expected code gen llvm-svn: 330290	2018-04-18 20:22:26 +00:00
Lei Huang	192c6ccf6d	[Power9]Legalize and emit code for converting Unsigned HWord/Char to Quad-Precision Legalize and emit code for converting unsigned HWord/Char to QP: xscvsdqp xscvudqp Only covering patterns for unsigned forms cause we don't have part-word sign-extending integer loads into VSX registers. Differential Revision: https://reviews.llvm.org/D45494 llvm-svn: 330278	2018-04-18 17:41:46 +00:00
Lei Huang	198e678576	[Power9]Legalize and emit code for converting (Un)Signed Word to Quad-Precision Legalize and emit code for converting (Un)Signed Word to quad-precision via: xscvsdqp xscvudqp Differential Revision: https://reviews.llvm.org/D45389 llvm-svn: 330273	2018-04-18 16:34:22 +00:00
Nemanja Ivanovic	a9a6dd826a	[PowerPC] Mark the BDNZ intrinsic as NoDuplicate Duplicating this intrinsic is not generally valid because it has the side-effect of decrementing the CTR. Any passes that duplicate it would need to be taught to keep the regions formed completely disjoint. This patch should be NFC for typical uses as CTRLoops runs after the remaining loop passes. It only affects situations where the loop passes are scheduled on the IR after the codegen passes (as is the case with some JIT pipelines). Fixes https://bugs.llvm.org/show_bug.cgi?id=37050 llvm-svn: 330186	2018-04-17 13:07:01 +00:00
Sanjay Patel	6c3af659a2	[DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses This is a transform that I limited in instcombine in rL329821 because it was creating more instructions in IR when the cast has multiple uses. But if the cast is free, then we can do the transform regardless of other uses because it improves the potential throughput of the calculation by removing a dependency on the fneg. Differential Revision: https://reviews.llvm.org/D45598 llvm-svn: 330098	2018-04-15 16:43:48 +00:00
Sanjay Patel	9adb386a8e	[PowerPC] add fsub-fneg test; NFC This is a test for a transform that was suggested in the post-commit mailing list thread for rL329821. The target in question is not in trunk, so PPC gets to stand in for it because it's the only in-tree target that sets 'isFPExtFree()' to 'true'. llvm-svn: 329963	2018-04-12 22:14:23 +00:00

... 4 5 6 7 8 ...

2237 Commits