llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	607774c960	Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT" After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without running into any problematic intrinsics. Also add a fix for lane copies, which don't support index 0. llvm-svn: 355871	2019-03-11 22:18:01 +00:00
Alex Bradbury	4d20cc21c7	[RISCV] Do a sign-extension in a compare-and-swap of 32 bit in RV64A AtomicCmpSwapWithSuccess is legalised into an AtomicCmpSwap plus a comparison. This requires an extension of the value which, by default, is a zero-extension. When we later lower AtomicCmpSwap into a PseudoCmpXchg32 and then expanded in RISCVExpandPseudoInsts.cpp, the lr.w instruction does a sign-extension. This mismatch of extensions causes the comparison to fail when the compared value is negative. This change overrides TargetLowering::getExtendForAtomicOps for RISC-V so it does a sign-extension instead. Differential Revision: https://reviews.llvm.org/D58829 Patch by Ferran Pallarès Roca. llvm-svn: 355869	2019-03-11 21:41:22 +00:00
Jessica Paquette	42d16501e6	[GlobalISel][AArch64] Always fall back on aarch64.neon.addp.* Overloaded intrinsics aren't necessarily safe for instruction selection. One such intrinsic is aarch64.neon.addp.*. This is a temporary workaround to ensure that we always fall back on that intrinsic. Eventually this will be replaced with a proper solution. https://bugs.llvm.org/show_bug.cgi?id=40968 Differential Revision: https://reviews.llvm.org/D59062 llvm-svn: 355865	2019-03-11 20:51:17 +00:00
Nikita Popov	aa7cfa75f9	[SDAG][AArch64] Legalize VECREDUCE Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860	2019-03-11 20:22:13 +00:00
Jonas Paulsson	8b8dc50e79	[RegAlloc] Avoid compile time regression with multiple copy hints. As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile time building opencollada"), this patch makes sure that no phys reg is hinted more than once from getRegAllocationHints(). This handles the case were many virtual registers are assigned to the same physreg. The previous compile time fix (r343686) in weightCalcHelper() only made sure that physical/virtual registers are passed no more than once to addRegAllocationHint(). Review: Dimitry Andric, Quentin Colombet https://reviews.llvm.org/D59201 llvm-svn: 355854	2019-03-11 19:00:37 +00:00
Simon Pilgrim	06ae025345	[X86] Extend widening comparison test. Ensure we test both v2i16 unary and binary comparisons. llvm-svn: 355849	2019-03-11 18:08:20 +00:00
Petar Jovanovic	28e13eb098	[MIPS][microMIPS] Add a pattern to match TruncIntFP A pattern needed to match TruncIntFP was missing. This was causing multiple tests from llvm test suite to fail during compilation for micromips. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58722 llvm-svn: 355825	2019-03-11 14:13:31 +00:00
Petar Avramovic	5229f47f9f	[MIPS GlobalISel] NarrowScalar G_UMULH NarrowScalar G_UMULH in LegalizerHelper using multiplyRegisters helper function. NarrowScalar G_UMULH for MIPS32. Differential Revision: https://reviews.llvm.org/D58825 llvm-svn: 355815	2019-03-11 10:08:44 +00:00
Petar Avramovic	0b17e59b5c	[MIPS GlobalISel] NarrowScalar G_MUL Narrow Scalar G_MUL for MIPS32. Revisit NarrowScalar implementation in LegalizerHelper. Introduce new helper function multiplyRegisters. It performs generic multiplication of values held in multiple registers. Generated instructions use only types NarrowTy and i1. Destination can be same or two times size of the source. Differential Revision: https://reviews.llvm.org/D58824 llvm-svn: 355814	2019-03-11 10:00:17 +00:00
Craig Topper	00afa193f1	[X86] Enable sse2_cvtsd2ss intrinsic to use an EVEX encoded instruction. llvm-svn: 355810	2019-03-11 06:01:04 +00:00
Amaury Sechet	a5820cbd20	Add test case for add to sub post legalization. NFC llvm-svn: 355797	2019-03-11 01:25:48 +00:00
Sanjay Patel	26e06e859e	[x86] add x86-specific opcodes to extractelement scalarization list llvm-svn: 355792	2019-03-10 18:56:21 +00:00
Craig Topper	93e15dfacc	[X86] Make lowering of intrinsics with rounding mode stricter so that only valid rounding modes are lowered. Update tests accordingly Many of our tests were not using valid rounding mode immediates. Clang verifies this in the frontend when it creates the intrinsics from builtins, but the backend would still lower invalid immediates. With this change we will now leave them as intrinsics if the immediate is invalid. This will cause an isel selection failure. llvm-svn: 355789	2019-03-10 17:20:45 +00:00
Nikita Popov	bfec0d610c	[AArch64] Add tests for saddsat/ssubsat; NFC Signed versions of the existing unsigned tests. llvm-svn: 355787	2019-03-10 12:21:36 +00:00
Nikita Popov	506c1aba4d	[ARM] Use non-constant operand in umulo-32.ll; NFC Currently the store+load is folded and both operands of the umulo end up being constants. To avoid this getting folded away entirely, make sure at least one operand is non-constant. Also remove some allocas which don't seem relevant to the test. llvm-svn: 355776	2019-03-09 13:43:21 +00:00
Nikita Popov	74dde7e5a1	[ARM] Generate test checks for umulo-32.ll; NFC The second test case is going to be changed by D59041, so generate full baseline checks. llvm-svn: 355775	2019-03-09 13:21:15 +00:00
Alex Bradbury	fea4957177	[RISCV] Support -target-abi at the MC layer and for codegen This patch adds proper handling of -target-abi, as accepted by llvm-mc and llc. Lowering (codegen) for the hard-float ABIs will follow in a subsequent patch. However, this patch does add MC layer support for the hard float and RVE ABIs (emission of the appropriate ELF flags https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-file-header). ABI parsing must be shared between codegen and the MC layer, so we add computeTargetABI to RISCVUtils. A warning will be printed if an invalid or unrecognized ABI is given. Differential Revision: https://reviews.llvm.org/D59023 llvm-svn: 355771	2019-03-09 09:28:06 +00:00
Thomas Lively	972d7d514b	[WebAssembly] Use named operands to identify loads and stores Summary: Uses the named operands tablegen feature to look up the indices of offset, address, and p2align operands for all load and store instructions. This replaces brittle, incorrect logic for identifying loads and store when eliminating frame indices, which previously crashed on bulk-memory ops. It also cleans up the SetP2Alignment pass. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59007 llvm-svn: 355770	2019-03-09 04:31:37 +00:00
Sanjay Patel	40bcc3de7d	[x86] add tests for extract of FP select; NFC llvm-svn: 355768	2019-03-09 02:11:05 +00:00
Craig Topper	69f8c1653d	[ScalarizeMaskedMemIntrin] Use IRBuilder functions that take uint32_t/uint64_t for getelementptr, extractelement, and insertelement. This saves needing to call getInt32 ourselves. Making the code a little shorter. The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue. This cleanup because I plan to write similar code for expandload/compressstore. llvm-svn: 355767	2019-03-09 02:08:41 +00:00
Amara Emerson	7a05d1c1f1	[AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required by ABI. Fixes PR41001. llvm-svn: 355745	2019-03-08 22:17:00 +00:00
Sanjay Patel	f84083b4db	[x86] scalarize extract element 0 of FP cmp An extension of D58282 noted in PR39665: https://bugs.llvm.org/show_bug.cgi?id=39665 This doesn't answer the request to use movmsk, but that's an independent problem. We need this and probably still need scalarization of FP selects because we can't do that as a target-independent transform (although it seems likely that targets besides x86 should have this transform). llvm-svn: 355741	2019-03-08 21:54:41 +00:00
Mitch Phillips	790edbc16e	[HWASan] Save + print registers when tag mismatch occurs in AArch64. Summary: This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance. In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable. The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format: Registers where the failure occurred (pc 0x0055555561b4): x0 0000000000000014 x1 0000007ffffff6c0 x2 1100007ffffff6d0 x3 12000056ffffe025 x4 0000007fff800000 x5 0000000000000014 x6 0000007fff800000 x7 0000000000000001 x8 12000056ffffe020 x9 0200007700000000 x10 0200007700000000 x11 0000000000000000 x12 0000007fffffdde0 x13 0000000000000000 x14 02b65b01f7a97490 x15 0000000000000000 x16 0000007fb77376b8 x17 0000000000000012 x18 0000007fb7ed6000 x19 0000005555556078 x20 0000007ffffff768 x21 0000007ffffff778 x22 0000000000000001 x23 0000000000000000 x24 0000000000000000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000 x28 0000000000000000 x29 0000007ffffff6f0 x30 00000055555561b4 ... and prints after the dump of memory tags around the buggy address. Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging. Reviewers: pcc, eugenis Reviewed By: pcc, eugenis Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58857 llvm-svn: 355738	2019-03-08 21:22:35 +00:00
Matt Arsenault	e8c03a2511	AMDGPU: Move d16 load matching to preprocess step When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731	2019-03-08 20:58:11 +00:00
Matt Arsenault	26e76ef0e2	DAG: Don't try to cluster loads with tied inputs This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728	2019-03-08 20:46:15 +00:00
Sanjay Patel	43f098e719	[x86] add tests for extracted vector FP cmp; NFC llvm-svn: 355727	2019-03-08 20:45:27 +00:00
Matt Arsenault	74c9c305e0	AMDGPU: Add more tests for d16 loads Also fix a few cases that weren't testing what they were supposed to. llvm-svn: 355724	2019-03-08 20:30:51 +00:00
Matt Arsenault	07f904befb	AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr This was checking the wrong operands for the base register and the offsets. The indexes are shifted by the number of output registers from the machine instruction definition, and the chain is moved to the end. llvm-svn: 355722	2019-03-08 20:30:50 +00:00
Amaury Sechet	782ac933b5	[DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a) Summary: This pattern is sometime created after legalization. Reviewers: efriedma, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58874 llvm-svn: 355716	2019-03-08 19:39:32 +00:00
Sanjay Patel	b22f438df3	[x86] prevent infinite looping from inverse shuffle transforms llvm-svn: 355713	2019-03-08 19:20:28 +00:00
Simon Pilgrim	53652feab7	[X86] Add test case for PR22473 llvm-svn: 355712	2019-03-08 19:16:26 +00:00
Simon Pilgrim	00ab0339ed	Fix typo in constant vector llvm-svn: 355699	2019-03-08 15:17:26 +00:00
Clement Courbet	8e16d73346	[SelectionDAG] Allow the user to specify a memeq function. Summary: Right now, when we encounter a string equality check, e.g. `if (memcmp(a, b, s) == 0)`, we try to expand to a comparison if `s` is a small compile-time constant, and fall back on calling `memcmp()` else. This is sub-optimal because memcmp has to compute much more than equality. This patch replaces `memcmp(a, b, s) == 0` by `bcmp(a, b, s) == 0` on platforms that support `bcmp`. `bcmp` can be made much more efficient than `memcmp` because equality compare is trivially parallel while lexicographic ordering has a chain dependency. Subscribers: fedor.sergeev, jyknight, ckennelly, gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D56593 llvm-svn: 355672	2019-03-08 09:07:45 +00:00
Carl Ritson	1a98dc1840	[AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructions Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions. Reviewers: arsenm, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59091 llvm-svn: 355671	2019-03-08 09:03:11 +00:00
Craig Topper	4505c99e72	[X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather. We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store. For FP we now explicitly check for only double and float. For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly. We only check bitwidth after checking that the type is an integer. llvm-svn: 355667	2019-03-08 07:33:43 +00:00
Sanjay Patel	5ed14ef1e4	[x86] add extract FP tests for target-specific nodes; NFC llvm-svn: 355655	2019-03-07 23:55:54 +00:00
Konstantin Zhuravlyov	47f0bf8f1f	AMDHSA: Code object v3 updates - Copy kernel symbol attributes into kernel descriptor attributes - Make sure kernel symbol's visibility is not "higher" than protected Differential Revision: https://reviews.llvm.org/D59057 llvm-svn: 355630	2019-03-07 19:58:29 +00:00
Vlad Tsyrklevich	2e1479e2f2	Delete x86_64 ShadowCallStack support Summary: ShadowCallStack on x86_64 suffered from the same racy security issues as Return Flow Guard and had performance overhead as high as 13% depending on the benchmark. x86_64 ShadowCallStack was always an experimental feature and never shipped a runtime required to support it, as such there are no expected downstream users. Reviewers: pcc Reviewed By: pcc Subscribers: mgorny, javed.absar, hiraditya, jdoerfert, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59034 llvm-svn: 355624	2019-03-07 18:56:36 +00:00
David Green	ffc922ec35	[LSR] Attempt to increase the accuracy of LSR's setup cost In some loops, we end up generating loop induction variables that look like: {(-1 * (zext i16 (%i0 * %i1) to i32))<nsw>,+,1} As opposed to the simpler: {(zext i16 (%i0 * %i1) to i32),+,-1} i.e we count up from -limit to 0, not the simpler counting down from limit to 0. This is because the scores, as LSR calculates them, are the same and the second is filtered in place of the first. We end up with a redundant SUB from 0 in the code. This patch tries to make the calculation of the setup cost a little more thoroughly, recursing into the scev members to better approximate the setup required. The cost function for comparing LSR costs is: return std::tie(C1.NumRegs, C1.AddRecCost, C1.NumIVMuls, C1.NumBaseAdds, C1.ScaleCost, C1.ImmCost, C1.SetupCost) < std::tie(C2.NumRegs, C2.AddRecCost, C2.NumIVMuls, C2.NumBaseAdds, C2.ScaleCost, C2.ImmCost, C2.SetupCost); So this will only alter results if none of the other variables turn out to be different. Differential Revision: https://reviews.llvm.org/D58770 llvm-svn: 355597	2019-03-07 13:44:40 +00:00
Petar Avramovic	3d3120dc9a	[MIPS GlobalISel] Fix mul operands Unsigned mul high for MIPS32 is selected into two PseudoInstructions: PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for some of its operands. Registers in this class have appropriate hi and lo register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc. mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td. In functions where mul and PseudoMULTu are present fastRegisterAllocator will "run out of registers during register allocation" because 'calcSpillCost' for $ac0 will return spillImpossible because subregisters $lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction. Differential Revision: https://reviews.llvm.org/D58715 llvm-svn: 355594	2019-03-07 13:28:29 +00:00
Craig Topper	3acc4236b8	[X86] Enable combineFMinNumFMaxNum for 512 bit vectors when AVX512 is enabled. Simplified by just checking if the vector type is legal rather than listing all combinations of types and features. Fixes PR40984. llvm-svn: 355582	2019-03-07 06:30:19 +00:00
Craig Topper	a0dd6e9a08	[X86] Add 512-bit fminnum/maxnum test cases for PR40984. Also add v8f32 minnum/maxnum tests. NFC llvm-svn: 355581	2019-03-07 05:56:52 +00:00
Aakanksha Patil	c56d2afc63	AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV) A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 llvm-svn: 355574	2019-03-07 00:54:04 +00:00
Simon Atanasyan	83b88441ad	[mips] Replace assertion by error message while lowering `RETURNADDR` and `FRAMEADDR` MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current frame only. It's better to show an error message then crash on assertion if `__builtin_return_address` is invoked with non-zero argument. llvm-svn: 355558	2019-03-06 22:40:28 +00:00
Abderrazek Zaafrani	5ced596198	[AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions https://reviews.llvm.org/D58855 llvm-svn: 355545	2019-03-06 20:30:06 +00:00
Nikita Popov	e1012e1efb	[X86] Add vector mulo with power of two operand tests; NFC llvm-svn: 355544	2019-03-06 20:25:49 +00:00
Paul Robinson	05efe0fdc4	[PS4] Emit a trap after a stack-protector fail call. llvm-svn: 355542	2019-03-06 19:57:43 +00:00
Sanjay Patel	f941631875	[AArch64] add tests for uaddsat/usubsat; NFC The tests are copied from the sibling x86 test files. llvm-svn: 355535	2019-03-06 19:02:01 +00:00
Simon Pilgrim	417f8c5be4	[PowerPC] Use real pointers instead of undef The reduced test removed the pointer arguments, but to better survive D58017 and D58070 we need them back. llvm-svn: 355532	2019-03-06 18:49:39 +00:00
Guozhi Wei	11308bdb43	[PPC] Adjust the computed branch offset for the possible shorter distance In file PPCBranchSelector.cpp we tend to over estimate code size due to large alignment and inline assembly. Usually it causes larger computed branch offset, it is not big problem. But sometimes it may also causes smaller computed branch offset than actual branch offset. If the offset is close to the limit of encoding, it may cause problem at run time. Following is a simplified example. actual estimated address address ... bne Far 100 10c .p2align 4 Near: 110 110 ... Far: 8108 8108 Actual offset: 0x8108 - 0x100 = 0x8008 Computed offset: 0x8108 - 0x10c = 0x7ffc The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual offset. So we add this number to the offset for safety. Differential Revision: https://reviews.llvm.org/D57718 llvm-svn: 355529	2019-03-06 18:22:22 +00:00
Krzysztof Parzyszek	9c005bbdd4	[Hexagon] Avoid creating 5-instruction packets with vgather pseudos Change the resource usage of the vgather pseudos from SLOT0+LD to SLOT0+SLOT1. llvm-svn: 355524	2019-03-06 17:43:50 +00:00
Ryan Taylor	67f36903ae	[AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions Summary: This adds support for 64 bit buffer atomic arithmetic instructions but does not include cmpswap as that depends on a fix to the way the register pairs are handled Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58918 llvm-svn: 355520	2019-03-06 17:02:06 +00:00
Simon Pilgrim	cdf95f8f07	[DAGCombiner] Enable UADDO/USUBO vector combine support Differential Revision: https://reviews.llvm.org/D58965 llvm-svn: 355517	2019-03-06 16:11:03 +00:00
Alexander Kornienko	3d467a890e	Revert "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" This reverts commit `2a0f2c5ef3` (r355490). The commit causes an assertion failure when compiling LLVM code: $ cat repro.cpp class QQQ { public: bool x() const; bool y() const; unsigned getSizeInBits() const { if (y() \|\| x()) return getScalarSizeInBits(); return getScalarSizeInBits() * 2; } unsigned getScalarSizeInBits() const; }; int f(const QQQ &Ty) { switch (Ty.getSizeInBits()) { case 1: case 8: return 0; case 16: return 1; case 32: return 2; case 64: return 3; default: __builtin_unreachable(); } } $ clang -O2 -o repro.o repro.cpp assert.h assertion failed at llvm/include/llvm/ADT/ilist_iterator.h:139 in llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, true, false>::operator() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, IsReverse = true, IsConst = false]: !NodePtr->isKnownSentinel() Check failure stack trace: * @ 0x558aab4afc10 __assert_fail @ 0x558aa885479b llvm::ilist_iterator<>::operator() @ 0x558aa8854715 llvm::MachineInstrBundleIterator<>::operator() @ 0x558aa92c33c3 llvm::X86InstrInfo::optimizeCompareInstr() @ 0x558aa9a9c251 (anonymous namespace)::PeepholeOptimizer::optimizeCmpInstr() @ 0x558aa9a9b371 (anonymous namespace)::PeepholeOptimizer::runOnMachineFunction() @ 0x558aa99a4fc8 llvm::MachineFunctionPass::runOnFunction() @ 0x558aab019fc4 llvm::FPPassManager::runOnFunction() @ 0x558aab01a3a5 llvm::FPPassManager::runOnModule() @ 0x558aab01aa9b (anonymous namespace)::MPPassManager::runOnModule() @ 0x558aab01a635 llvm::legacy::PassManagerImpl::run() @ 0x558aab01afe1 llvm::legacy::PassManager::run() @ 0x558aa5914769 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly() @ 0x558aa5910f44 clang::EmitBackendOutput() @ 0x558aa5906135 clang::BackendConsumer::HandleTranslationUnit() @ 0x558aa6d165ad clang::ParseAST() @ 0x558aa6a94e22 clang::ASTFrontendAction::ExecuteAction() @ 0x558aa590255d clang::CodeGenAction::ExecuteAction() @ 0x558aa6a94840 clang::FrontendAction::Execute() @ 0x558aa6a38cca clang::CompilerInstance::ExecuteAction() @ 0x558aa4e2294b clang::ExecuteCompilerInvocation() @ 0x558aa4df6200 cc1_main() @ 0x558aa4e1b37f ExecuteCC1Tool() @ 0x558aa4e1a725 main @ 0x7ff20d56abbd __libc_start_main @ 0x558aa4df51c9 _start llvm-svn: 355515	2019-03-06 15:23:50 +00:00
Strahinja Petrovic	94fccc93de	[PowerPC] Add secure plt support for TLS symbols This patch supports secure plt mode for TLS symbols. Differential Revision: https://reviews.llvm.org/D45520 llvm-svn: 355513	2019-03-06 15:00:10 +00:00
Simon Pilgrim	1bdc2d1874	[DAGCombiner] Add SADDO/SSUBO combine support Basic constant handling folds, for both scalars and vectors Differential Revision: https://reviews.llvm.org/D58967 llvm-svn: 355506	2019-03-06 14:22:21 +00:00
Roman Lebedev	4764310505	[X86][NFC] Autogenerate check lines in cmovcmov.ll test Investigating 8-bit cmov promotion, this test comes up. llvm-svn: 355496	2019-03-06 11:47:43 +00:00
Simon Pilgrim	642f53d292	[DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442) Differential Revision: https://reviews.llvm.org/D58968 llvm-svn: 355495	2019-03-06 11:04:21 +00:00
Simon Pilgrim	468bb2e601	[X86][SSE] VSELECT(XOR(Cond,-1), LHS, RHS) --> VSELECT(Cond, RHS, LHS) As noticed on D58965 DAGCombiner::visitSELECT has something similar, so we should be able to move this to DAGCombiner and support VSELECT as well at some point. Differential Revision: https://reviews.llvm.org/D58974 llvm-svn: 355494	2019-03-06 10:54:43 +00:00
Ayonam Ray	2a0f2c5ef3	[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Differential Revision: https://reviews.llvm.org/D52002 Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 355490	2019-03-06 10:01:02 +00:00
Ayonam Ray	af92b7a3b8	Reversing the commit of revision 355483 since it is giving a regression on a newly added test. llvm-svn: 355487	2019-03-06 07:51:28 +00:00
Craig Topper	c0e01d29a4	[X86] Enable the add with 128 -> sub with -128 encoding trick with X86ISD::ADD when the carry flag isn't used. This allows us to use an 8-bit sign extended immediate instead of a 16 or 32 bit immediate. Also do similar for 0x80000000 with 64-bit adds to avoid having to use a movabsq. llvm-svn: 355485	2019-03-06 07:36:38 +00:00
Craig Topper	97a1c4c340	[X86] Suppress load folding for add/sub with 128 immediate. 128 won't fit in a sign extended 8-bit immediate, but we can negate it to -128 and use the other operation. This results in a shorter encoding since the move would have used 16 or 32 bits for the immediate. llvm-svn: 355484	2019-03-06 07:36:36 +00:00
Ayonam Ray	6025fa8e30	[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Differential Revision: https://reviews.llvm.org/D52002 Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 355483	2019-03-06 07:27:45 +00:00
Roman Lebedev	98d412ff13	[X86][NFC] Add proper test for promotion of i8 cmov's of trunc's There was no proper test for that code in X86TargetLowering::LowerSELECT(). Noticed accidentally while trying to modify the last branch in that function. llvm-svn: 355452	2019-03-05 22:43:53 +00:00
Heejin Ahn	ef9d6aea45	[WebAssembly] Disable MachineBlockPlacement pass Summary: This pass hurts code size for wasm and sometimes generates irreducible control flow. Context: https://github.com/emscripten-core/emscripten/pull/8233 Reviewers: kripken, dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58953 llvm-svn: 355437	2019-03-05 20:35:34 +00:00
Roman Lebedev	c38831e11d	[NFC][CodeGen][X86][AArch64] Add tests for C++ std::midpoint() pattern (PR40965) Tests only for integers, not floating point or pointers. The scalar 8-bit case uses branch instead of CMOV, because there is no no 8-bit CMOV. Vector tests are for consistency, since it can be vectorized. https://bugs.llvm.org/show_bug.cgi?id=40965 llvm-svn: 355436	2019-03-05 20:18:47 +00:00
Matt Arsenault	870397739e	AMDGPU: Preserve undef flag when expanding SI_IF Fixes undefined value verifier error. llvm-svn: 355426	2019-03-05 18:38:00 +00:00
Craig Topper	4a9dd7c39b	[X86] Enable 8-bit SHL to convert to LEA Differential Revision: https://reviews.llvm.org/D58870 llvm-svn: 355425	2019-03-05 18:37:41 +00:00
Craig Topper	216bf7f03b	[X86] Allow 8-bit INC/DEC to be converted to LEA. We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too. Differential Revision: https://reviews.llvm.org/D58869 llvm-svn: 355424	2019-03-05 18:37:37 +00:00
Craig Topper	572e94ca02	[X86] Enable 8-bit OR with disjoint bits to convert to LEA We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64. Differential Revision: https://reviews.llvm.org/D58863 llvm-svn: 355423	2019-03-05 18:37:33 +00:00
Simon Pilgrim	40441aa86a	[X86][SSE] Regenerate vector zero tests llvm-svn: 355412	2019-03-05 16:52:14 +00:00
Jessica Paquette	00d5847b5c	Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT" This broke test-suite::aarch64_neon_intrinsics.test Reverting while I look into it. Example failure: http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740 llvm-svn: 355408	2019-03-05 15:47:00 +00:00
Simon Pilgrim	f011e53a78	[X86] Add SMULO/UMULO combine tests Include scalar and vector test variants covering the folds in DAGCombiner (vector isn't currently supported - PR40442) llvm-svn: 355407	2019-03-05 15:36:45 +00:00
Simon Pilgrim	65676571e1	Fix typo in constant vector llvm-svn: 355405	2019-03-05 15:06:01 +00:00
Simon Pilgrim	a3d06ccd5e	[X86] Add SADDO/UADDO and SSUBO/USUBO combine tests Include scalar and vector test variants covering the folds in DAGCombiner (vector isn't currently supported - PR40442) llvm-svn: 355404	2019-03-05 14:52:42 +00:00
Simon Pilgrim	4d93b9c75c	[X86] Add test cases for D58874 Add scalar and vector test cases for missing (add (add (xor a, -1), b), 1) -> (sub b, a) fold llvm-svn: 355400	2019-03-05 13:52:09 +00:00
Carl Ritson	9e3f7d8ad0	[AMDGPU] Fix DPP operand order in atomic optimizer Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions. Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30 Reviewers: sheredom, tpr Reviewed By: sheredom Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58900 llvm-svn: 355394	2019-03-05 12:21:44 +00:00
Oliver Stannard	4a9086b537	[ARM] Fix select_cc lowering for fp16 When lowering a select_cc node where the true and false values are of type f16, we can't use a general conditional move because the FP16 instructions do not support conditional execution. Instead, we must ensure that the condition code is one of the four supported by the VSEL instruction. Differential revision: https://reviews.llvm.org/D58813 llvm-svn: 355385	2019-03-05 10:42:34 +00:00
David Stuttard	81eec58a0d	[AMDGPU] Omit KILL instructions from hazard recognizer Summary: In some cases the KILL was causing a hazard to be introduced as these were scheduled into hazard slots, but don't result in an instruction. KILL shouldn't be considered for hazard recognition. Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58898 llvm-svn: 355384	2019-03-05 10:25:16 +00:00
Chen Zheng	9cfe7e81f1	[PowerPC] fix killed/dead flag after convert x-form to d-form tranformation. Differential Revision: https://reviews.llvm.org/D58428 llvm-svn: 355378	2019-03-05 04:56:54 +00:00
Yonghong Song	d82247cb80	[BPF] Do not generate BTF sections unnecessarily If There is no types/non-empty strings, do not generate .BTF section. If there is no func_info/line_info, do not generate .BTF.ext section. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D58936 llvm-svn: 355360	2019-03-05 01:01:21 +00:00
Florian Hahn	fd2d89f98b	Fix invalid target triples in tests. (NFC) llvm-svn: 355349	2019-03-04 23:37:41 +00:00
Jessica Paquette	caf62b1d47	[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases where the index is defined by a G_CONSTANT. It also factos out the lane copy opcode selection part into its own function, `getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and `selectExtractElt`. Differential Revision: https://reviews.llvm.org/D58469 llvm-svn: 355344	2019-03-04 22:35:32 +00:00
Jessica Paquette	0632e12f89	[GlobalISel][AArch64] Legalize vector G_SELECT Just scalarize it, and add a test showing it works. Differential Revision: https://reviews.llvm.org/D58747 llvm-svn: 355339	2019-03-04 21:12:46 +00:00
Amara Emerson	8acb0d9c82	Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1." The code to materialize a mask from a constant pool load tried to use a 128 bit LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted in a link failure in the NEON tests in the test suite since the LDR address was unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is 64 bit, before converting back to a 128 bit register for the TBL. llvm-svn: 355326	2019-03-04 19:16:00 +00:00
Craig Topper	509a8a3cf1	[DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag. This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality Differential Revision: https://reviews.llvm.org/D58884 llvm-svn: 355324	2019-03-04 19:12:16 +00:00
Simon Pilgrim	eeb1144d27	[X86] Regenerate illegal type load test with non-undef load address. This would be affected by an upcoming patch without undoing some of the bugpoint reduction. llvm-svn: 355316	2019-03-04 14:49:02 +00:00
Jeremy Morse	09d8ea5282	[X86] Avoid codegen changes when DBG_VALUE appears between lowered selects X86TargetLowering::EmitLoweredSelect presently detects sequences of CMOV pseudo instructions without accounting for debug intrinsics. This leads to different codegen with and without option -g, if a DBG_VALUE instruction lands in the middle of several lowered selects. Work around this by skipping over debug instructions when looking for CMOV sequences, and sinking those debug insts into the EmitLoweredSelect sunk block. This might slightly shift where variables appear in the instruction sequence, but won't re-order assignments. Differential Revision: https://reviews.llvm.org/D58672 llvm-svn: 355307	2019-03-04 10:56:02 +00:00
Oliver Stannard	181afc7f3b	[ARM] Fix selection of VLDR.16 instruction with imm offset The isScaledConstantInRange function takes upper and lower bounds which are checked after dividing by the scale, so the bounds checks for half, single and double precision should all be the same. Previously, we had wrong bounds checks for half precision, so selected an immediate the instructions can't actually represent. Differential revision: https://reviews.llvm.org/D58822 llvm-svn: 355305	2019-03-04 09:17:38 +00:00
Heejin Ahn	195a62e9ae	[WebAssembly] Delete ThrowUnwindDest map from WasmEHFuncInfo Summary: Before when we implemented the first EH proposal, 'catch <tag>' instruction may not catch an exception so there were multiple EH pads an exception can unwind to. That means a BB could have multiple EH pad successors. Now after we switched to the new proposal, every 'catch' instruction catches an exception, and there is only one catchpad per catchswitch, so we at most have one EH pad successor, making `ThrowUnwindDest` map in `WasmEHInfo` unnecessary. Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems, because other optimization passes can split a BB that contains possibly throwing calls (previously invokes), and we have to update the map every time that happens, which is not easy for common CodeGen passes. This also correctly updates successor info in LateEHPrepare when we add a rethrow instruction. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58486 llvm-svn: 355296	2019-03-03 22:35:56 +00:00
Craig Topper	e9e4a0f5b4	[X86] Regenerate test to get the full FP operands printed. NFC Missed when I updated the printer to print implicit %st operand on binops. llvm-svn: 355295	2019-03-03 20:28:52 +00:00
Simon Pilgrim	d8e91a54c0	[X86] getShuffleScalarElt - peek through insert/extract subvector nodes. llvm-svn: 355288	2019-03-03 14:11:05 +00:00
Craig Topper	ce68659772	[X86] Prefer VPBLENDD for v2i64/v4i64 blends with AVX2. We were using VPBLENDW for v2i64 and VBLENDPD for v4i64. VPBLENDD has better throughput than VPBLENDW on some CPUs so it makes sense to use it when possible. VBLENDPD will probably become VBLENDD during execution domain fixing, but we might as well use integer in isel while we can. This should work around some issues with the domain fixing pass prefering PBLENDW when we start with PBLENDW. There may still be some v8i16 cases that could use PBLENDD. llvm-svn: 355281	2019-03-03 00:18:07 +00:00
Amaury Sechet	31291a403c	Add test case for add to sub transformation. NFC llvm-svn: 355269	2019-03-02 14:28:59 +00:00
Xing GUO	33649349c5	[Codegen] fix typos in test case llvm-svn: 355264	2019-03-02 08:03:59 +00:00
Thomas Lively	43876ae7bc	[WebAssembly] Expand operations not supported by SIMD Summary: This prevents crashes in instruction selection when these operations are used. The tests check that the scalar version of the instruction is used where applicable, although some expansions do not use the scalar version. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58859 llvm-svn: 355261	2019-03-02 03:32:25 +00:00
Amaury Sechet	f24abf6511	[X86] Improve use of SHLD/SHRD Summary: This extends the variety of pattern that can generate a SHLD instead of using two shifts. This fixes a regression that would be introduced by D57367 or D33587 Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57389 llvm-svn: 355260	2019-03-02 02:44:16 +00:00
Amaury Sechet	1cc0f6061f	Add test case for truncate funnel shifts. NFC llvm-svn: 355258	2019-03-02 02:24:36 +00:00
Vlad Tsyrklevich	8925138007	Revert "[MIPS GlobalISel] Fix mul operands" This reverts commit r355178, it is causing ASan failures on the sanitizer bots. llvm-svn: 355219	2019-03-01 18:58:22 +00:00
Oliver Stannard	82fbbc21fd	[ARM] Fix FP16 stack loads/stores for Thumb2 with frame pointer The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the immediate to encode the sign of the offset, like the other FP loads/stores, so need to be treated the same way. Differential revision: https://reviews.llvm.org/D58816 llvm-svn: 355201	2019-03-01 14:20:28 +00:00
Oliver Stannard	e019e6223b	[ARM] Consider undefined-on-NaN conditions in checkVSELConstraints This function was not checking for the condition code variants which are undefined if either input is NaN, so we were missing selection of the VSEL instruction in some cases when using -fno-honor-nans or -ffast-math. Differential revision: https://reviews.llvm.org/D58812 llvm-svn: 355199	2019-03-01 13:58:25 +00:00
Simon Pilgrim	1a059e6619	[X86] Regenerate legalize test files Noticed while getting update_mir_test_checks.py to work on python3 llvm-svn: 355198	2019-03-01 13:13:40 +00:00
Simon Pilgrim	69c670e4e3	[Thumb] Add some integer abs testcases for different typesizes. Committed on behalf of @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D52138 llvm-svn: 355197	2019-03-01 12:08:50 +00:00
Diana Picus	54829ec5d0	[ARM GlobalISel] Support G_CTLZ for Thumb2 Same as ARM mode but with different opcode. llvm-svn: 355191	2019-03-01 10:12:28 +00:00
Diana Picus	afb3398da0	[ARM GlobalISel] Check target flags in test. NFCI There was a time when we couldn't dump target-specific flags such as arm-sbrel etc, so the tests didn't check for them. We can now be more specific in our tests. llvm-svn: 355189	2019-03-01 10:01:22 +00:00
Stanislav Mekhanoshin	bb98841399	[AMDGPU] Mark ds instructions as meybeAtomic These were not recognized as potential atomics by memory legalizer. The test was working not because legalizer did a right thing, but because it has skipped all these instructions. When I have fixed DS desciption test started to fail because region address has changed from 4 to 2 a while ago. Differential Revision: https://reviews.llvm.org/D58802 llvm-svn: 355179	2019-03-01 07:59:17 +00:00
Petar Avramovic	9bf43b5c26	[MIPS GlobalISel] Fix mul operands Unsigned mul high for MIPS32 is selected into two PseudoInstructions: PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for some of its operands. Registers in this class have appropriate hi and lo register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc. mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td. In functions where mul and PseudoMULTu are present fastRegisterAllocator will "run out of registers during register allocation" because 'calcSpillCost' for $ac0 will return spillImpossible because subregisters $lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction. Differential Revision: https://reviews.llvm.org/D58715 llvm-svn: 355178	2019-03-01 07:35:57 +00:00
Petar Avramovic	a48285a190	[MIPS GlobalISel] Select G_UMULH Legalize G_UMULO and select G_UMULH for MIPS32. Differential Revision: https://reviews.llvm.org/D58714 llvm-svn: 355177	2019-03-01 07:25:44 +00:00
Tom Stellard	33634d1b25	AMDGPU/GlobalISel: Implement select for G_INSERT Re-commit r344310. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 355159	2019-03-01 00:50:26 +00:00
Thomas Lively	ae79f42a2f	[WebAssembly] Fix crash when @llvm.global_dtors is external Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58799 llvm-svn: 355157	2019-03-01 00:12:13 +00:00
Tom Stellard	41f32196a0	AMDGPU/GlobalISel: Implement select for G_EXTRACT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49714 llvm-svn: 355156	2019-02-28 23:37:48 +00:00
Eli Friedman	d19a7060c6	[AArch64] [Windows] Don't skip constructing UnwindHelp. In certain cases, the first non-frame-setup instruction in a function is a branch. For example, it could be a cbz on an argument. Make sure we correctly allocate the UnwindHelp, and find an appropriate register to use to initialize it. Fixes https://bugs.llvm.org/show_bug.cgi?id=40184 Differential Revision: https://reviews.llvm.org/D58752 llvm-svn: 355136	2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani	abfd10807c	[AArch64] Improve FP16 vector convert from short instructions. https://reviews.llvm.org/D58563 llvm-svn: 355134	2019-02-28 20:21:46 +00:00
Sanjay Patel	7fc6ef7dd7	[x86] scalarize extract element 0 of FP math This is another step towards ensuring that we produce the optimal code for reductions, but there are other potential benefits as seen in the tests diffs: 1. Memory loads may get scalarized resulting in more efficient code. 2. Memory stores may get scalarized resulting in more efficient code. 3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch. 4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch. The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions. Differential Revision: https://reviews.llvm.org/D58282 llvm-svn: 355130	2019-02-28 19:47:04 +00:00
Jiong Wang	3da8bcd0a0	bpf: enable sub-register code-gen for XADD Support sub-register code-gen for XADD is like supporting any other Load and Store patterns. No new instruction is introduced. lock (u32 )(r1 + 0) += w2 has exactly the same underlying insn as: lock (u32 )(r1 + 0) += r2 BPF_W width modifier has guaranteed they behave the same at runtime. This patch merely teaches BPF back-end that BPF_W width modifier could work GPR32 register class and that's all needed for sub-register code-gen support for XADD. test/CodeGen/BPF/xadd.ll updated to include sub-register code-gen tests. A new testcase test/CodeGen/BPF/xadd_legal.ll is added to make sure the legal case could pass on all code-gen modes. It could also test dead Def check on GPR32. If there is no proper handling like what has been done inside BPFMIChecking.cpp:hasLivingDefs, then this testcase will fail. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 355126	2019-02-28 19:21:28 +00:00
Craig Topper	8b1703fc1d	[X86] Add test case that was supposed to go with r355116. llvm-svn: 355117	2019-02-28 18:50:16 +00:00
Amara Emerson	8d70e6425c	Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1." Seems to break some neon intrinsics tests. llvm-svn: 355115	2019-02-28 18:47:29 +00:00
Thomas Lively	f3b4f99007	[WebAssembly] Remove uses of ThreadModel Summary: In the clang UI, replaces -mthread-model posix with -matomics as the source of truth on threading. In the backend, replaces -thread-model=posix with the atomics target feature, which is now collected on the WebAssemblyTargetMachine along with all other used features. These collected features will also be used to emit the target features section in the future. The default configuration for the backend is thread-model=posix and no atomics, which was previously an invalid configuration. This change makes the default valid because the thread model is ignored. A side effect of this change is that objects are never emitted with passive segments. It will instead be up to the linker to decide whether sections should be active or passive based on whether atomics are used in the final link. Reviewers: aheejin, sbc100, dschuff Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58742 llvm-svn: 355112	2019-02-28 18:39:08 +00:00
Amara Emerson	85c3afd7f6	[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1. This extends the existing support for shufflevector to handle cases like <2 x float>, which we can implement by concating the vectors and using a TBL1. Differential Revision: https://reviews.llvm.org/D58684 llvm-svn: 355104	2019-02-28 16:43:11 +00:00
Stefan Pintilie	bd5429ef38	[PowerPC] Move the stack pointer update instruction later in the prologue and earlier in the epilogue. Move the stdu instruction in the prologue and epilogue. This should provide a small performance boost in functions that are able to do this. I've kept this change rather conservative at the moment and functions with frame pointers or base pointers will not try to move the stack pointer update. Differential Revision: https://reviews.llvm.org/D42590 llvm-svn: 355085	2019-02-28 12:23:28 +00:00
Dmitri Gribenko	3d0576bbaf	Fixed a typo in the test s/CEHCK/CHECK/ Summary: Turns out the test was not correct, I had to adjust the test to work. I also added CHECK-LABELs for better error messages from FileCheck while I'm here. Reviewers: jsji Subscribers: nemanjai, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58614 llvm-svn: 355079	2019-02-28 10:56:39 +00:00
Simon Pilgrim	87aeff8bbb	[X86][AVX] Fold vf64 concat_vectors(movddup(x),movddup(x)) -> broadcast(x) llvm-svn: 355078	2019-02-28 10:53:58 +00:00
Diana Picus	3b7beafc77	[ARM GlobalISel] Support global variables for Thumb2 Add the same level of support as for ARM mode (i.e. still no TLS support). In most cases, it is sufficient to replace the opcodes with the t2-equivalent, but there are some idiosyncrasies that I decided to preserve because I don't understand the full implications: * For ARM we use LDRi12 to load from constant pools, but for Thumb we use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for Thumb as well, or to use LDRcp for ARM). * For Thumb we don't have an equivalent for MOV\|LDRLIT_ga_pcrel_ldr, so we have to generate MOV\|LDRLIT_ga_pcrel plus a load from GOT. The tests are in separate files because they're hard enough to read even without doubling the number of checks. llvm-svn: 355077	2019-02-28 10:42:47 +00:00
Matt Arsenault	bf1bf706c8	AMDGPU/GlobalISel: Add regbankselect test for phis Add baseline for future fixes. These mostly show how this is broken and producing illegal situations. llvm-svn: 355057	2019-02-28 00:52:36 +00:00
Matt Arsenault	5d567dc137	AMDGPU: Enable function calls by default Fixes some crashes on illegal call situations which are unfortunately still valid IR. llvm-svn: 355051	2019-02-28 00:40:32 +00:00
Abderrazek Zaafrani	2fc498a652	[AArch64] Generate FP16 vector compare instructions. https://reviews.llvm.org/D58561 llvm-svn: 355050	2019-02-28 00:31:38 +00:00
Matt Arsenault	aa03bcd23c	AMDGPU: Fix crashes in invalid call cases We have to at least tolerate calls to kernels, possibly with a mismatched calling convention on the callsite. llvm-svn: 355049	2019-02-28 00:28:44 +00:00
Matt Arsenault	d3093c2f1f	GlobalISel: Implement fewerElementsVector for phi llvm-svn: 355048	2019-02-28 00:16:32 +00:00
Matt Arsenault	72bcf15dbf	GlobalISel: Implement moreElementsVector for phi llvm-svn: 355047	2019-02-28 00:01:05 +00:00
Joerg Sonnenberger	6a198366a0	Default to Secure PLT on PPC for NetBSD and OpenBSD. This matches the default settings of clang. llvm-svn: 355038	2019-02-27 21:53:14 +00:00
Philip Reames	288a95fc8c	Seperate volatility and atomicity/ordering in SelectionDAG At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..). This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success. To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll).. Previously landed D57593 Fix a bug in the definition of isUnordered on MachineMemOperand D57596 [CodeGen] Be conservative about atomic accesses as for volatile D57802 Be conservative about unordered accesses for the moment rL353959: [Tests] First batch of cornercase tests for unordered atomics. rL353966: [Tests] RMW folding tests w/unordered atomic operations. rL353972: [Tests] More unordered atomic lowering tests. rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface rL354740: [Hexagon, SystemZ] Be super conservative about atomics rL354800: [Lanai] Be super conservative about atomics rL354845: [ARM] Be super conservative about atomics Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.) Differential Revision: https://reviews.llvm.org/D57601 llvm-svn: 355025	2019-02-27 20:20:08 +00:00
Dmitry Preobrazhensky	ef92035827	[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969	2019-02-27 13:12:12 +00:00
Simon Pilgrim	71bb6850cf	[X86][AVX] Only combine loads to broadcasts for legal types Thanks to @echristo for spotting this. llvm-svn: 354961	2019-02-27 11:17:25 +00:00
Yonghong Song	cc290a9e91	[BPF] Don't fail for static variables Currently, the LLVM will print an error like Unsupported relocation: try to compile with -O2 or above, or check your static variable usage if user defines more than one static variables in a single ELF section (e.g., .bss or .data). There is ongoing effort to support static and global variables in libbpf and kernel. This patch removed the assertion so user programs with static variables won't fail compilation. The static variable in-section offset is written to the "imm" field of the corresponding to-be-relocated bpf instruction. Below is an example to show how the application (e.g., libbpf) can relate variable to relocations. -bash-4.4$ cat g1.c static volatile long a = 2; static volatile int b = 3; int test() { return a + b; } -bash-4.4$ clang -target bpf -O2 -c g1.c -bash-4.4$ llvm-readelf -r g1.o Relocation section '.rel.text' at offset 0x158 contains 2 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000400000001 R_BPF_64_64 0000000000000000 .data 0000000000000018 0000000400000001 R_BPF_64_64 0000000000000000 .data -bash-4.4$ llvm-readelf -s g1.o Symbol table '.symtab' contains 6 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g1.c 2: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 a 3: 0000000000000008 4 OBJECT LOCAL DEFAULT 4 b 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 5: 0000000000000000 64 FUNC GLOBAL DEFAULT 2 test -bash-4.4$ llvm-objdump -d g1.o g1.o: file format ELF64-BPF Disassembly of section .text: 0000000000000000 test: 0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 2: 79 11 00 00 00 00 00 00 r1 = (u64 )(r1 + 0) 3: 18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00 r2 = 8 ll 5: 61 20 00 00 00 00 00 00 r0 = (u32 )(r2 + 0) 6: 0f 10 00 00 00 00 00 00 r0 += r1 7: 95 00 00 00 00 00 00 00 exit -bash-4.4$ . from symbol table, static variable "a" is in section #4, offset 0. . from symbol table, static variable "b" is in section #4, offset 8. . the first relocation is against symbol #4: 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 and in-section offset 0 (see llvm-objdump result) . the second relocation is against symbol #4: 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 and in-section offset 8 (see llvm-objdump result) . therefore, the first relocation is for variable "a", and the second relocation is for variable "b". Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 354954	2019-02-27 05:36:15 +00:00
Heejin Ahn	82da1ffc16	[WebAssembly] Fix ScopeTops info in CFGStackify for EH pads Summary: When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we should create not only `end_try` -> `try` mapping but also `catch` -> `try` mapping as well. If this is not created, `block` and `end_block` markers later added may span across an existing `catch`, resulting in the incorrect code like: ``` try block --\| (X) catch \| end_block --\| end_try ``` Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58605 llvm-svn: 354945	2019-02-27 01:35:14 +00:00
Heejin Ahn	cf699b4534	[WebAssembly] Remove unnecessary instructions after TRY marker placement Summary: This removes unnecessary instructions after TRY marker placement. There are two cases: - `end`/`end_block` can be removed if they overlap with `try`/`end_try` and they have the same return types. - `br` right before `catch` that branches to after `end_try` can be deleted. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58591 llvm-svn: 354939	2019-02-27 00:50:53 +00:00
Jonas Paulsson	129826cd9f	[SystemZ] Pass regalloc hints to help Load-and-Test transformations. Since there is no "Load-and-Test-High" instruction, the 32 bit load of a register to be compared with 0 can only be implemented with LT if the virtual GRX32 register ends up in a low part (GR32 register). This patch detects these cases and passes the GR32 registers (low parts) as (soft) hints in getRegAllocationHints(). Review: Ulrich Weigand. llvm-svn: 354935	2019-02-27 00:18:28 +00:00
Stanislav Mekhanoshin	da1628eb67	[AMDGPU] Fixed hang during DAG combine SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926	2019-02-26 20:56:25 +00:00
Reid Kleckner	8fda7e15e6	[X86] Fix bug in vectorcall calling convention Original implementation can't correctly handle __m256 and __m512 types passed by reference through stack. This patch fixes it. Patch by Wei Xiao! Differential Revision: https://reviews.llvm.org/D57643 llvm-svn: 354921	2019-02-26 19:48:16 +00:00
Petar Avramovic	bd39569913	[MIPS GlobalISel] Select G_UADDO Lower G_UADDO. Legalize G_UADDO for MIPS32 Differential Revision: https://reviews.llvm.org/D58671 llvm-svn: 354900	2019-02-26 17:22:42 +00:00
Ganesh Gopalasubramanian	e172d7008d	[X86] AMD znver2 enablement This patch enables the following 1) AMD family 17h "znver2" tune flag (-march, -mcpu). 2) ISAs that are enabled for "znver2" architecture. 3) For the time being, it uses the znver1 scheduler model. 4) Tests are updated. 5) Scheduler descriptions are yet to be put in place. Reviewers: craig.topper Differential Revision: https://reviews.llvm.org/D58343 llvm-svn: 354897	2019-02-26 16:55:10 +00:00
Jonas Paulsson	c110b5b69f	[SystemZ] Wait with selection of legal vector/FP constants until Select(). This patch aims to make sure that any such constant that can be generated with a vector instruction (for example VGBM) is recognized as such during legalization and kept as a target independent node through post-legalize DAGCombining. Two new functions named isVectorConstantLegal() and loadVectorConstant() replace old ways of handling vector/FP constants. A new struct named SystemZVectorConstantInfo is used to cache the results of isVectorConstantLegal() and pass them onto loadVectorConstant(). Support for fp128 constants in the presence of FeatureVectorEnhancements1 (z14) has been added. Review: Ulrich Weigand https://reviews.llvm.org/D58270 llvm-svn: 354896	2019-02-26 16:47:59 +00:00
Nirav Dave	582d46328c	[DAG] Fix constant store folding to handle non-byte sizes. Avoid crashes from zero-byte values due to sub-byte store sizes. Reviewers: uabelho, courbet, rnk Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58626 llvm-svn: 354884	2019-02-26 15:02:32 +00:00
Simon Atanasyan	8cb497027d	[mips] Emit `.module softfloat` directive This change fixes crash on an assertion in case of using `soft float` ABI for mips32r6 target. llvm-svn: 354882	2019-02-26 14:45:17 +00:00
Simon Pilgrim	d4a406e499	[AArch64] Add arithmetic zext bswap tests. As requested on D58017. llvm-svn: 354872	2019-02-26 13:22:35 +00:00
Simon Pilgrim	e42be1eae2	[AArch64] Add 'free' zext bswap tests. As requested on D58017. llvm-svn: 354869	2019-02-26 12:04:37 +00:00
Luke Cheeseman	9e285bef2b	[ARM] Add Cortex-M35P - Add LLVM backend support for Cortex-M35P - Documentation can be found at https://developer.arm.com/products/processors/cortex-m/cortex-m35p Differentail Revision: https://reviews.llvm.org/D57763 llvm-svn: 354868	2019-02-26 12:02:12 +00:00
Simon Pilgrim	810fa04ac7	[LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763) If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results. I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit. Differential Revision: https://reviews.llvm.org/D58637 llvm-svn: 354866	2019-02-26 11:27:53 +00:00
Simon Pilgrim	2ccc120d19	[AMDGPU] Regenerate bswap/bitreverse tests. Make codegen changes more obvious in D58017 llvm-svn: 354863	2019-02-26 11:01:08 +00:00
Dan Gohman	c71132c0be	[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments For outgoing varargs arguments, it's necessary to check the OrigAlign field of the corresponding OutputArg entry to determine argument alignment, rather than just computing an alignment from the argument value type. This is because types like fp128 are split into multiple argument values, with narrower types that don't reflect the ABI alignment of the full fp128. This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase. Differential Revision: https://reviews.llvm.org/D58656 llvm-svn: 354846	2019-02-26 05:20:19 +00:00
Heejin Ahn	d2a56ac661	[WebAssembly] Fix a bug deleting instruction in a ranged for loop Summary: We shouldn't delete elements while iterating a ranged for loop. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58519 llvm-svn: 354844	2019-02-26 04:08:49 +00:00
Heejin Ahn	7829763e49	[WebAssembly] Improve readability of EH tests Summary: - Indent check lines to easily figure out try-catch-end structure - Add the original C++ code the tests were genereated from - Add a few more lines to make the structure more readable - Rename a couple function / structures - Add label and branch annotations to cfg-stackify-eh.ll - Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll` because it will be updated in a later CL soon and there's no point of making it look better here Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58562 llvm-svn: 354842	2019-02-26 03:29:59 +00:00
Reid Kleckner	2f055f026a	[X86] Fix bug in x86_intrcc with arg copy elision Summary: Use a custom calling convention handler for interrupts instead of fixing up the locations in LowerMemArgument. This way, the offsets are correct when constructed and we don't need to account for them in as many places. Depends on D56883 Replaces D56275 Reviewers: craig.topper, phil-opp Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56944 llvm-svn: 354837	2019-02-26 02:11:25 +00:00
Stanislav Mekhanoshin	ab25f1e65b	[AMDGPU] Added target to mir test. NFC. Test was used without -mcpu, although tested instructions not available on all ASICs. llvm-svn: 354830	2019-02-25 22:59:55 +00:00
Matt Arsenault	752579736e	RegBankSelect: Handle slightly more complex value mappings Try to use concat_vectors. Also remove unnecessary assert on pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers for AMDGPU. llvm-svn: 354828	2019-02-25 22:24:13 +00:00
Matt Arsenault	f4bfe4cd17	AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes llvm-svn: 354825	2019-02-25 21:32:48 +00:00
Matt Arsenault	82b103998b	AMDGPU/GlobalISel: Clamp max implicit_def elements llvm-svn: 354818	2019-02-25 20:46:06 +00:00
Craig Topper	316c58e8f1	[X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case. Differential Revision: https://reviews.llvm.org/D58475 llvm-svn: 354811	2019-02-25 19:42:47 +00:00
Nikita Popov	fcbd7f6495	[Mips] Fix missing masking in fast-isel of br (PR40325) Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending (and x, 1) the condition before branching on it. To avoid regressing trivial cases, I'm combining emission of cmp+br sequences for the single-use + same block case (similar to what we do in x86). icmpbr1.ll still regresses due to the cross-bb usage of the condition. Differential Revision: https://reviews.llvm.org/D58576 llvm-svn: 354808	2019-02-25 18:54:17 +00:00
Simon Pilgrim	28441ac75f	[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold Differential Revision: https://reviews.llvm.org/D58585 llvm-svn: 354793	2019-02-25 16:02:01 +00:00
David Green	b504f104b2	[ARM] Add some more missing T1 opcodes for the peephole optimisier This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791	2019-02-25 15:50:54 +00:00
Luke Cheeseman	59f77e7891	[AArch64] Add support for Cortex-A76 and Cortex-A76AE - Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788	2019-02-25 15:08:27 +00:00
Dmitri Gribenko	a3a3964f98	Fixed typos in tests: s/CHEKC/CHECK/ Reviewers: ilya-biryukov Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58611 llvm-svn: 354785	2019-02-25 13:41:59 +00:00
Dmitri Gribenko	751c5fbf6a	Fixed typos in tests: s/CEHCK/CHECK/ Reviewers: ilya-biryukov Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58608 llvm-svn: 354781	2019-02-25 13:12:33 +00:00
Simon Pilgrim	c61f1e8e6c	[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483) Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results. Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB. Differential Revision: https://reviews.llvm.org/D58597 llvm-svn: 354771	2019-02-25 11:19:37 +00:00
Simon Tatham	b70fc0c5fd	[ARM] Make fullfp16 instructions not conditionalisable. More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768	2019-02-25 10:39:53 +00:00
Kang Zhang	4faa4090c9	[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts Summary: Fast selection of llvm fptoi & fptrunc instructions is not handled well about VSX instruction support. We'd use VSX float convert integer instruction instead of non-vsx float convert integer instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is openeded. For float trunc instruction, we do this silimar work like float convert integer instruction to try to use VSX instruction. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D58430 llvm-svn: 354762	2019-02-25 02:46:16 +00:00
Simon Pilgrim	f43c48cb52	[X86] Add PR40483 test cases Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops llvm-svn: 354758	2019-02-24 21:13:29 +00:00
Simon Pilgrim	cfaf663a35	[X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637) Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step. llvm-svn: 354757	2019-02-24 19:57:52 +00:00
Craig Topper	3fe4bd464c	[X86] Fix tls variable lowering issue with large code model Summary: The problem here is the lowering for tls variable. Below is the DAG for the code. SelectionDAG has 11 nodes: t0: ch = EntryToken t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64 t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 t12: i64 = add t8, t11 t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64 t6: ch = CopyToReg t0, Register:i32 %0, t4 And when mcmodel is large, below instruction can NOT be folded. t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0" When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails. The patch is to fold the load and X86ISD::WrapperRIP. Fixes PR26906 Patch by LuoYuanke Reviewers: craig.topper, rnk, annita.zhang, wxiao3 Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58336 llvm-svn: 354756	2019-02-24 19:33:37 +00:00
Craig Topper	5532a98737	[X86][SSE] Use pblendw for v4i32/v2i64 during isel. Summary: Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58574 llvm-svn: 354755	2019-02-24 19:23:41 +00:00
Craig Topper	be3348573e	[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753	2019-02-24 19:23:36 +00:00
Sanjay Patel	cb04ba032f	[CGP] add special-cases to form unsigned add with overflow (PR40486) There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746	2019-02-24 15:31:27 +00:00
Craig Topper	dc185522fb	[TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands. The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist. X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here. Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change. llvm-svn: 354738	2019-02-23 21:41:44 +00:00
Craig Topper	be9eeb5526	Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" And its follow ups r354511, r354640. A follow patch will fix the issue that caused it to be reverted. llvm-svn: 354737	2019-02-23 21:41:42 +00:00
Craig Topper	ccc860cb81	Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734	2019-02-23 19:51:32 +00:00
Nikita Popov	e661f946a7	[WebAssembly] Fix select of and (PR40805) Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by patterns added in D53676. I'm removing the patterns entirely here, as they are not correct in the general case. If necessary something more specific can be added in the future. Differential Revision: https://reviews.llvm.org/D58575 llvm-svn: 354733	2019-02-23 18:59:01 +00:00
Simon Pilgrim	e08f177ea2	[X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x) For AVX1, limit this to i32/f32/i64/f64 loading cases only. llvm-svn: 354730	2019-02-23 18:34:05 +00:00
Simon Pilgrim	31793733a0	[X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD. llvm-svn: 354729	2019-02-23 17:10:47 +00:00
Simon Dardis	86a589e38d	[MIPS] Fix a incorrect test. (NFC) This test is incorrect as it should be using the microMIPSR6 instruction to return, not the microMIPS version. llvm-svn: 354726	2019-02-23 15:56:32 +00:00
Craig Topper	75afc0105c	[X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel. Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t. This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost. The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results. llvm-svn: 354724	2019-02-23 08:34:10 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Craig Topper	a9697f24cf	[X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits. If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register. Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization. llvm-svn: 354709	2019-02-23 00:35:02 +00:00
Craig Topper	b95ca56361	[X86] Add a few test cases for a v8i64 sext/zext from an illegal type that needs to be promoted to 128 bits. If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg. If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half. llvm-svn: 354708	2019-02-23 00:34:58 +00:00
Sam Clegg	275d15ecf3	[WebAssembly] Update CodeGen test expectations after rL354697. NFC llvm-svn: 354705	2019-02-23 00:07:39 +00:00
Sanjay Patel	973143ab79	[CGP] add tests for uaddo increment/decrement; NFC llvm-svn: 354699	2019-02-22 23:19:34 +00:00
Guozhi Wei	4c8e480358	[MBP] Factor out function hasViableTopFallthrough and enhancement This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect. Differential Revision: https://reviews.llvm.org/D58393 llvm-svn: 354682	2019-02-22 18:04:37 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Sanjay Patel	a9e289174a	[x86] allow narrowing of vector UINT_TO_FP As discussed in: D56864 D58197 Always use the narrow (128-bit) instruction when possible. We already had the signed int version of this transform. llvm-svn: 354675	2019-02-22 15:47:45 +00:00
Petar Jovanovic	6083106b12	[mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM Filling a delay slot in 32bit jump instructions with a 16bit instruction can cause issues. According to the documentation such an operation is unpredictable. This patch adds opcode Mips::PseudoIndirectBranch_MM alongside Mips::PseudoIndirectBranch and other instructions that are expanded to jr instruction and do not allow a 16bit instruction in their delay slots. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58507 llvm-svn: 354672	2019-02-22 14:53:58 +00:00
Roman Tereshin	99a6672bba	[LowerSwitch][AMDGPU] Do not handle impossible values This patch adds LazyValueInfo to LowerSwitch to compute the range of the value being switched over and reduce the size of the tree LowerSwitch builds to lower a switch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D58096 llvm-svn: 354670	2019-02-22 14:33:46 +00:00
David Green	acb628b2af	[ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667	2019-02-22 12:23:31 +00:00
Diana Picus	35e1c6663c	[ARM GlobalISel] Support floating point for Thumb2 This is exactly the same as arm mode, so for the instruction selector tests we just extract them to a new file and run with the same checks for both arm and thumb mode. For the legalizer we need to update the tests for soft float a bit, but only because BL and tBL are slightly different. We could be pedantic and check that we get a well-formed BL for arm mode and a tBL for thumb, but for the purposes of the legalizer test it's sufficient to just skip over the predicate operands in the checks. Also note that we have the pedantic checks in the divmod test, so we're covered. llvm-svn: 354665	2019-02-22 09:54:54 +00:00
Craig Topper	fa6187d230	[LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads. When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits. We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results. So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR. llvm-svn: 354655	2019-02-22 07:03:25 +00:00
Craig Topper	0ca023b3b7	[X86] Add test cases to cover the path in VectorLegalizer::ExpandLoad for non-byte sized loads where bits from two loads need to be concatenated. If the scalar type doesn't divide evenly into the WideVT then the code will need to take some bits from adjacent scalar loads and combine them. But most of our testing is for i1 element type which always divides evenly. llvm-svn: 354653	2019-02-22 06:18:32 +00:00
Craig Topper	3a391fc0e8	[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit. Type legalization is causing two nodes to be created here, but we can use a single node to extend from v8i16 to v2i64. llvm-svn: 354648	2019-02-22 01:49:53 +00:00
Craig Topper	be22f329a9	[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract. Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those. But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results. By promoting the input at the same time we can create only a single any_extend or truncate. There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch. llvm-svn: 354647	2019-02-22 01:49:50 +00:00
Craig Topper	427404c769	[X86] Fix some copy/paste mistakes that caused a VR128 to be used as the address of a load in an isel pattern This was introduced in r354511. Fixes PR40811. llvm-svn: 354640	2019-02-22 00:04:35 +00:00

... 2 3 4 5 6 ...

28126 Commits