llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonas Paulsson	129826cd9f	[SystemZ] Pass regalloc hints to help Load-and-Test transformations. Since there is no "Load-and-Test-High" instruction, the 32 bit load of a register to be compared with 0 can only be implemented with LT if the virtual GRX32 register ends up in a low part (GR32 register). This patch detects these cases and passes the GR32 registers (low parts) as (soft) hints in getRegAllocationHints(). Review: Ulrich Weigand. llvm-svn: 354935	2019-02-27 00:18:28 +00:00
Stanislav Mekhanoshin	da1628eb67	[AMDGPU] Fixed hang during DAG combine SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926	2019-02-26 20:56:25 +00:00
Reid Kleckner	8fda7e15e6	[X86] Fix bug in vectorcall calling convention Original implementation can't correctly handle __m256 and __m512 types passed by reference through stack. This patch fixes it. Patch by Wei Xiao! Differential Revision: https://reviews.llvm.org/D57643 llvm-svn: 354921	2019-02-26 19:48:16 +00:00
Petar Avramovic	bd39569913	[MIPS GlobalISel] Select G_UADDO Lower G_UADDO. Legalize G_UADDO for MIPS32 Differential Revision: https://reviews.llvm.org/D58671 llvm-svn: 354900	2019-02-26 17:22:42 +00:00
Ganesh Gopalasubramanian	e172d7008d	[X86] AMD znver2 enablement This patch enables the following 1) AMD family 17h "znver2" tune flag (-march, -mcpu). 2) ISAs that are enabled for "znver2" architecture. 3) For the time being, it uses the znver1 scheduler model. 4) Tests are updated. 5) Scheduler descriptions are yet to be put in place. Reviewers: craig.topper Differential Revision: https://reviews.llvm.org/D58343 llvm-svn: 354897	2019-02-26 16:55:10 +00:00
Jonas Paulsson	c110b5b69f	[SystemZ] Wait with selection of legal vector/FP constants until Select(). This patch aims to make sure that any such constant that can be generated with a vector instruction (for example VGBM) is recognized as such during legalization and kept as a target independent node through post-legalize DAGCombining. Two new functions named isVectorConstantLegal() and loadVectorConstant() replace old ways of handling vector/FP constants. A new struct named SystemZVectorConstantInfo is used to cache the results of isVectorConstantLegal() and pass them onto loadVectorConstant(). Support for fp128 constants in the presence of FeatureVectorEnhancements1 (z14) has been added. Review: Ulrich Weigand https://reviews.llvm.org/D58270 llvm-svn: 354896	2019-02-26 16:47:59 +00:00
Nirav Dave	582d46328c	[DAG] Fix constant store folding to handle non-byte sizes. Avoid crashes from zero-byte values due to sub-byte store sizes. Reviewers: uabelho, courbet, rnk Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58626 llvm-svn: 354884	2019-02-26 15:02:32 +00:00
Simon Atanasyan	8cb497027d	[mips] Emit `.module softfloat` directive This change fixes crash on an assertion in case of using `soft float` ABI for mips32r6 target. llvm-svn: 354882	2019-02-26 14:45:17 +00:00
Simon Pilgrim	d4a406e499	[AArch64] Add arithmetic zext bswap tests. As requested on D58017. llvm-svn: 354872	2019-02-26 13:22:35 +00:00
Simon Pilgrim	e42be1eae2	[AArch64] Add 'free' zext bswap tests. As requested on D58017. llvm-svn: 354869	2019-02-26 12:04:37 +00:00
Luke Cheeseman	9e285bef2b	[ARM] Add Cortex-M35P - Add LLVM backend support for Cortex-M35P - Documentation can be found at https://developer.arm.com/products/processors/cortex-m/cortex-m35p Differentail Revision: https://reviews.llvm.org/D57763 llvm-svn: 354868	2019-02-26 12:02:12 +00:00
Simon Pilgrim	810fa04ac7	[LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763) If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results. I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit. Differential Revision: https://reviews.llvm.org/D58637 llvm-svn: 354866	2019-02-26 11:27:53 +00:00
Simon Pilgrim	2ccc120d19	[AMDGPU] Regenerate bswap/bitreverse tests. Make codegen changes more obvious in D58017 llvm-svn: 354863	2019-02-26 11:01:08 +00:00
Dan Gohman	c71132c0be	[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments For outgoing varargs arguments, it's necessary to check the OrigAlign field of the corresponding OutputArg entry to determine argument alignment, rather than just computing an alignment from the argument value type. This is because types like fp128 are split into multiple argument values, with narrower types that don't reflect the ABI alignment of the full fp128. This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase. Differential Revision: https://reviews.llvm.org/D58656 llvm-svn: 354846	2019-02-26 05:20:19 +00:00
Heejin Ahn	d2a56ac661	[WebAssembly] Fix a bug deleting instruction in a ranged for loop Summary: We shouldn't delete elements while iterating a ranged for loop. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58519 llvm-svn: 354844	2019-02-26 04:08:49 +00:00
Heejin Ahn	7829763e49	[WebAssembly] Improve readability of EH tests Summary: - Indent check lines to easily figure out try-catch-end structure - Add the original C++ code the tests were genereated from - Add a few more lines to make the structure more readable - Rename a couple function / structures - Add label and branch annotations to cfg-stackify-eh.ll - Temporarily delete check lines for `test1` in `cfg-stackify-eh.ll` because it will be updated in a later CL soon and there's no point of making it look better here Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58562 llvm-svn: 354842	2019-02-26 03:29:59 +00:00
Reid Kleckner	2f055f026a	[X86] Fix bug in x86_intrcc with arg copy elision Summary: Use a custom calling convention handler for interrupts instead of fixing up the locations in LowerMemArgument. This way, the offsets are correct when constructed and we don't need to account for them in as many places. Depends on D56883 Replaces D56275 Reviewers: craig.topper, phil-opp Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56944 llvm-svn: 354837	2019-02-26 02:11:25 +00:00
Stanislav Mekhanoshin	ab25f1e65b	[AMDGPU] Added target to mir test. NFC. Test was used without -mcpu, although tested instructions not available on all ASICs. llvm-svn: 354830	2019-02-25 22:59:55 +00:00
Matt Arsenault	752579736e	RegBankSelect: Handle slightly more complex value mappings Try to use concat_vectors. Also remove unnecessary assert on pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers for AMDGPU. llvm-svn: 354828	2019-02-25 22:24:13 +00:00
Matt Arsenault	f4bfe4cd17	AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes llvm-svn: 354825	2019-02-25 21:32:48 +00:00
Matt Arsenault	82b103998b	AMDGPU/GlobalISel: Clamp max implicit_def elements llvm-svn: 354818	2019-02-25 20:46:06 +00:00
Craig Topper	316c58e8f1	[X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case. Differential Revision: https://reviews.llvm.org/D58475 llvm-svn: 354811	2019-02-25 19:42:47 +00:00
Nikita Popov	fcbd7f6495	[Mips] Fix missing masking in fast-isel of br (PR40325) Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending (and x, 1) the condition before branching on it. To avoid regressing trivial cases, I'm combining emission of cmp+br sequences for the single-use + same block case (similar to what we do in x86). icmpbr1.ll still regresses due to the cross-bb usage of the condition. Differential Revision: https://reviews.llvm.org/D58576 llvm-svn: 354808	2019-02-25 18:54:17 +00:00
Simon Pilgrim	28441ac75f	[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold Differential Revision: https://reviews.llvm.org/D58585 llvm-svn: 354793	2019-02-25 16:02:01 +00:00
David Green	b504f104b2	[ARM] Add some more missing T1 opcodes for the peephole optimisier This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791	2019-02-25 15:50:54 +00:00
Luke Cheeseman	59f77e7891	[AArch64] Add support for Cortex-A76 and Cortex-A76AE - Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788	2019-02-25 15:08:27 +00:00
Dmitri Gribenko	a3a3964f98	Fixed typos in tests: s/CHEKC/CHECK/ Reviewers: ilya-biryukov Subscribers: nemanjai, javed.absar, jsji, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58611 llvm-svn: 354785	2019-02-25 13:41:59 +00:00
Dmitri Gribenko	751c5fbf6a	Fixed typos in tests: s/CEHCK/CHECK/ Reviewers: ilya-biryukov Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58608 llvm-svn: 354781	2019-02-25 13:12:33 +00:00
Simon Pilgrim	c61f1e8e6c	[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483) Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results. Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB. Differential Revision: https://reviews.llvm.org/D58597 llvm-svn: 354771	2019-02-25 11:19:37 +00:00
Simon Tatham	b70fc0c5fd	[ARM] Make fullfp16 instructions not conditionalisable. More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768	2019-02-25 10:39:53 +00:00
Kang Zhang	4faa4090c9	[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts Summary: Fast selection of llvm fptoi & fptrunc instructions is not handled well about VSX instruction support. We'd use VSX float convert integer instruction instead of non-vsx float convert integer instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is openeded. For float trunc instruction, we do this silimar work like float convert integer instruction to try to use VSX instruction. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D58430 llvm-svn: 354762	2019-02-25 02:46:16 +00:00
Simon Pilgrim	f43c48cb52	[X86] Add PR40483 test cases Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops llvm-svn: 354758	2019-02-24 21:13:29 +00:00
Simon Pilgrim	cfaf663a35	[X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637) Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step. llvm-svn: 354757	2019-02-24 19:57:52 +00:00
Craig Topper	3fe4bd464c	[X86] Fix tls variable lowering issue with large code model Summary: The problem here is the lowering for tls variable. Below is the DAG for the code. SelectionDAG has 11 nodes: t0: ch = EntryToken t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64 t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 t12: i64 = add t8, t11 t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64 t6: ch = CopyToReg t0, Register:i32 %0, t4 And when mcmodel is large, below instruction can NOT be folded. t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0" When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails. The patch is to fold the load and X86ISD::WrapperRIP. Fixes PR26906 Patch by LuoYuanke Reviewers: craig.topper, rnk, annita.zhang, wxiao3 Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58336 llvm-svn: 354756	2019-02-24 19:33:37 +00:00
Craig Topper	5532a98737	[X86][SSE] Use pblendw for v4i32/v2i64 during isel. Summary: Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58574 llvm-svn: 354755	2019-02-24 19:23:41 +00:00
Craig Topper	be3348573e	[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753	2019-02-24 19:23:36 +00:00
Sanjay Patel	cb04ba032f	[CGP] add special-cases to form unsigned add with overflow (PR40486) There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746	2019-02-24 15:31:27 +00:00
Craig Topper	dc185522fb	[TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands. The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist. X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here. Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change. llvm-svn: 354738	2019-02-23 21:41:44 +00:00
Craig Topper	be9eeb5526	Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" And its follow ups r354511, r354640. A follow patch will fix the issue that caused it to be reverted. llvm-svn: 354737	2019-02-23 21:41:42 +00:00
Craig Topper	ccc860cb81	Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734	2019-02-23 19:51:32 +00:00
Nikita Popov	e661f946a7	[WebAssembly] Fix select of and (PR40805) Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by patterns added in D53676. I'm removing the patterns entirely here, as they are not correct in the general case. If necessary something more specific can be added in the future. Differential Revision: https://reviews.llvm.org/D58575 llvm-svn: 354733	2019-02-23 18:59:01 +00:00
Simon Pilgrim	e08f177ea2	[X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x) For AVX1, limit this to i32/f32/i64/f64 loading cases only. llvm-svn: 354730	2019-02-23 18:34:05 +00:00
Simon Pilgrim	31793733a0	[X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD. llvm-svn: 354729	2019-02-23 17:10:47 +00:00
Simon Dardis	86a589e38d	[MIPS] Fix a incorrect test. (NFC) This test is incorrect as it should be using the microMIPSR6 instruction to return, not the microMIPS version. llvm-svn: 354726	2019-02-23 15:56:32 +00:00
Craig Topper	75afc0105c	[X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel. Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t. This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost. The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results. llvm-svn: 354724	2019-02-23 08:34:10 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Craig Topper	a9697f24cf	[X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits. If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register. Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization. llvm-svn: 354709	2019-02-23 00:35:02 +00:00
Craig Topper	b95ca56361	[X86] Add a few test cases for a v8i64 sext/zext from an illegal type that needs to be promoted to 128 bits. If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg. If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half. llvm-svn: 354708	2019-02-23 00:34:58 +00:00
Sam Clegg	275d15ecf3	[WebAssembly] Update CodeGen test expectations after rL354697. NFC llvm-svn: 354705	2019-02-23 00:07:39 +00:00
Sanjay Patel	973143ab79	[CGP] add tests for uaddo increment/decrement; NFC llvm-svn: 354699	2019-02-22 23:19:34 +00:00
Guozhi Wei	4c8e480358	[MBP] Factor out function hasViableTopFallthrough and enhancement This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect. Differential Revision: https://reviews.llvm.org/D58393 llvm-svn: 354682	2019-02-22 18:04:37 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Sanjay Patel	a9e289174a	[x86] allow narrowing of vector UINT_TO_FP As discussed in: D56864 D58197 Always use the narrow (128-bit) instruction when possible. We already had the signed int version of this transform. llvm-svn: 354675	2019-02-22 15:47:45 +00:00
Petar Jovanovic	6083106b12	[mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM Filling a delay slot in 32bit jump instructions with a 16bit instruction can cause issues. According to the documentation such an operation is unpredictable. This patch adds opcode Mips::PseudoIndirectBranch_MM alongside Mips::PseudoIndirectBranch and other instructions that are expanded to jr instruction and do not allow a 16bit instruction in their delay slots. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58507 llvm-svn: 354672	2019-02-22 14:53:58 +00:00
Roman Tereshin	99a6672bba	[LowerSwitch][AMDGPU] Do not handle impossible values This patch adds LazyValueInfo to LowerSwitch to compute the range of the value being switched over and reduce the size of the tree LowerSwitch builds to lower a switch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D58096 llvm-svn: 354670	2019-02-22 14:33:46 +00:00
David Green	acb628b2af	[ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667	2019-02-22 12:23:31 +00:00
Diana Picus	35e1c6663c	[ARM GlobalISel] Support floating point for Thumb2 This is exactly the same as arm mode, so for the instruction selector tests we just extract them to a new file and run with the same checks for both arm and thumb mode. For the legalizer we need to update the tests for soft float a bit, but only because BL and tBL are slightly different. We could be pedantic and check that we get a well-formed BL for arm mode and a tBL for thumb, but for the purposes of the legalizer test it's sufficient to just skip over the predicate operands in the checks. Also note that we have the pedantic checks in the divmod test, so we're covered. llvm-svn: 354665	2019-02-22 09:54:54 +00:00
Craig Topper	fa6187d230	[LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads. When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits. We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results. So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR. llvm-svn: 354655	2019-02-22 07:03:25 +00:00
Craig Topper	0ca023b3b7	[X86] Add test cases to cover the path in VectorLegalizer::ExpandLoad for non-byte sized loads where bits from two loads need to be concatenated. If the scalar type doesn't divide evenly into the WideVT then the code will need to take some bits from adjacent scalar loads and combine them. But most of our testing is for i1 element type which always divides evenly. llvm-svn: 354653	2019-02-22 06:18:32 +00:00
Craig Topper	3a391fc0e8	[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit. Type legalization is causing two nodes to be created here, but we can use a single node to extend from v8i16 to v2i64. llvm-svn: 354648	2019-02-22 01:49:53 +00:00
Craig Topper	be22f329a9	[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract. Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those. But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results. By promoting the input at the same time we can create only a single any_extend or truncate. There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch. llvm-svn: 354647	2019-02-22 01:49:50 +00:00
Craig Topper	427404c769	[X86] Fix some copy/paste mistakes that caused a VR128 to be used as the address of a load in an isel pattern This was introduced in r354511. Fixes PR40811. llvm-svn: 354640	2019-02-22 00:04:35 +00:00
Matt Arsenault	aa6fb4c45e	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. llvm-svn: 354634	2019-02-21 23:27:46 +00:00
Craig Topper	2b34fdc67f	[X86] Remove hasSideEffects=1 from the X87 pseudos with folded load. This was done in r321424 to prevent scheduling from reordering things. But now that we model FPCW as a dependency, I don't think the same scheduling we were trying to prevent can occur. llvm-svn: 354628	2019-02-21 22:00:15 +00:00
Sanjay Patel	234a5e8ea4	[x86] vectorize more cast ops in lowering to avoid register file transfers This is a follow-up to D56864. If we're extracting from a non-zero index before casting to FP, then shuffle the vector and optionally narrow the vector before doing the cast: cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0 This might be enough to close PR39974: https://bugs.llvm.org/show_bug.cgi?id=39974 Differential Revision: https://reviews.llvm.org/D58197 llvm-svn: 354619	2019-02-21 20:40:39 +00:00
Amara Emerson	1abe05c0dd	Re-land "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"" Thanks to Richard Trieu for pointing out that the failures were due to a use-after-free of an ArrayRef. llvm-svn: 354616	2019-02-21 20:20:16 +00:00
Mandeep Singh Grang	096fae32b3	[llvm] Fix typo: 's/ ot / to /' [NFC] llvm-svn: 354614	2019-02-21 20:04:20 +00:00
Krzysztof Parzyszek	f6e875bacf	[Hexagon] Use misaligned load instead of trap0(#0 ) for __builtin_trap The trap instruction is intercepted by various runtime environments, and instead of a crash it creates confusion. This reapplies r354606 with a fix. llvm-svn: 354611	2019-02-21 19:42:39 +00:00
Krzysztof Parzyszek	948c9f93c4	Revert r354606, it breaks asan tests llvm-svn: 354609	2019-02-21 19:33:58 +00:00
Krzysztof Parzyszek	5f47fac3a2	[Hexagon] Use misaligned load instead of trap0(#0 ) for __builtin_trap The trap instruction is intercepted by various runtime environments, and instead of a crash it creates confusion. llvm-svn: 354606	2019-02-21 18:39:22 +00:00
Nirav Dave	43a7600a7f	[PPC] Add store merging testcase. llvm-svn: 354595	2019-02-21 16:34:48 +00:00
Sanjay Patel	ba5ee817e9	[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793) This fold can occur during legalization, so it can fight with promotion to the larger type. It apparently takes a special sequence and subtarget to avoid more basic simplifications that would hide the problem. But there's a bigger question raised here: why does distributeTruncateThroughAnd() even exist? It duplicates functionality from a more minimal pattern that we already have. But getting rid of this function requires some preliminary steps. https://bugs.llvm.org/show_bug.cgi?id=40793 llvm-svn: 354594	2019-02-21 16:01:48 +00:00
Matt Arsenault	2e0ee47712	AMDGPU/GlobalISel: Make phis legal llvm-svn: 354592	2019-02-21 15:48:13 +00:00
Sanjay Patel	d288687676	[x86] regenerate checks; NFC llvm-svn: 354589	2019-02-21 15:30:28 +00:00
Nirav Dave	dce91c1edb	[X86] Fix copy-paste error in @ccz flag. @ccz operand should be equivalent to @cce. llvm-svn: 354588	2019-02-21 15:28:31 +00:00
Matt Arsenault	b10fa8df3f	AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 types llvm-svn: 354587	2019-02-21 15:22:20 +00:00
Diana Picus	dcaa939ab7	[ARM GlobalISel] Support G_FRAME_INDEX for Thumb2 Same as arm mode. llvm-svn: 354579	2019-02-21 13:00:02 +00:00
David Green	7a183a86be	Revert 354564: [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs I believe it's causing bootstrap failures for A32 code. I'll take a look at what's wrong. llvm-svn: 354569	2019-02-21 11:03:13 +00:00
David Green	89efe24eba	[ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354564	2019-02-21 10:30:09 +00:00
Sam Parker	6ed47bee27	[ARM] Negative constants mishandled in ARM CGP During type promotion, sometimes we convert negative an add with a negative constant into a sub with a positive constant. The loop that performs this transformation has two issues: - it iterates over a set, causing non-determinism. - it breaks, instead of continuing, when it finds the first non-negative operand. Differential Revision: https://reviews.llvm.org/D58452 llvm-svn: 354557	2019-02-21 09:33:18 +00:00
Sam Clegg	1634516e35	[WebAssembly] Default to something reasonable in WebAssemblyAddMissingPrototypes Previously if we couldn't derive a prototype for a "no-prototype" function from C we would leave it as is: void foo(...) With this change we instead give is an empty signature and remove the "no-prototype" attribute. This fixes the current wasm waterfall test failure. Tags: #llvm Differential Revision: https://reviews.llvm.org/D58488 llvm-svn: 354544	2019-02-21 03:27:00 +00:00
Stanislav Mekhanoshin	42e229e130	[AMDGPU] fix commuted case of sub combine Differential Revision: https://reviews.llvm.org/D58481 llvm-svn: 354543	2019-02-21 02:58:00 +00:00
Xin Tong	2d84c00dfa	Add skipFunction to PostRA machine sinking pass. Summary: Add skipFunction to PostRA machine sinking pass. Reviewers: junbuml Subscribers: arsenm, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57847 llvm-svn: 354541	2019-02-21 02:11:06 +00:00
Amara Emerson	71f2a5e60f	Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR" This reverts r354521 because it broke the bots, but passes on Darwin somehow. llvm-svn: 354532	2019-02-21 00:31:13 +00:00
Amara Emerson	883000d888	[GlobalISel] Add -O0 to some tests to see if it fixes them. I can't reproduce the failures locally, and greendragon also passes, but some other bots fail for reasons I don't understand. The only difference I can see between these tests is it's missing an -O0 If this doesn't work I'll revert and continue investigating. llvm-svn: 354529	2019-02-20 23:22:15 +00:00
Sam Clegg	6028c969ac	[WebAssembly] Don't error on conflicting uses of prototype-less functions When we can't determine with certainty the signature of a function import we pick the fist signature we find rather than error'ing out. The resulting program might not do what is expected since we might pick the wrong signature. However since undefined behavior in C to use the same function with different signatures this seems better than refusing to compile such programs. Fixes PR40472 Differential Revision: https://reviews.llvm.org/D58304 llvm-svn: 354523	2019-02-20 22:40:57 +00:00
Amara Emerson	a946d057b4	[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and implements them with a very pessimistic TBL2 instruction in the selector. For TBL2, support is also needed to generate constant pool entries and load from them in order to materialize the mask register. Currently supports <2 x s64> and <4 x s32> result types. Differential Revision: https://reviews.llvm.org/D58466 llvm-svn: 354521	2019-02-20 22:11:39 +00:00
Craig Topper	55cc7eb5cb	[X86] Add test cases to show missed opportunities to remove AND mask from BTC/BTS/BTR instructions when LHS of AND has known zeros. We can currently remove the mask if the immediate has all ones in the LSBs, but if the LHS of the AND is known zero, then the immediate might have had bits removed. A similar issue also occurs with shifts and rotates. I'm preparing a common fix for all of them. llvm-svn: 354520	2019-02-20 21:35:05 +00:00
Sanjay Patel	198cc305e9	[CGP] match a special-case of unsigned subtract overflow This is the 'sub0' (negate) pattern from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354519	2019-02-20 21:23:04 +00:00
Nirav Dave	48cf37b55c	[DAGCombine] Generalize Dead Store to overlapping stores. Summary: Remove stores that are immediately overwritten by larger stores. Reviewers: courbet, rnk Reviewed By: rnk Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58467 llvm-svn: 354518	2019-02-20 21:07:50 +00:00
Tom Stellard	79b5c3842b	AMDGPU/GlobalISel: Move SMRD selection logic to TableGen Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52922 llvm-svn: 354516	2019-02-20 21:02:37 +00:00
Craig Topper	8d9c224a8c	[SelectionDAG] Teach GetDemandedBits to look at the known zeros of the LHS when handling ISD::AND If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits. This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask. Differential Revision: https://reviews.llvm.org/D58464 llvm-svn: 354514	2019-02-20 20:52:26 +00:00
Nikita Popov	c3b496de7a	[SDAG] Support vector UMULO/SMULO Second part of https://bugs.llvm.org/show_bug.cgi?id=40442. This adds an extra UnrollVectorOverflowOp() method to SDAG, because the general UnrollOverflowOp() method can't deal with multiple results. Additionally we need to expand UMULO/SMULO during vector op legalization, as it may result in unrolling, which may need additional type legalization. Differential Revision: https://reviews.llvm.org/D57997 llvm-svn: 354513	2019-02-20 20:41:44 +00:00
Craig Topper	31823fba2e	[X86] Add more load folding patterns for blend instructions as a follow up to r354363. This avoids depending on the peephole pass to do load folding. Also adds some load folding for some insert_subvector patterns that use blend. All of this was found by temporarily adding TB_NO_FORWARD to the blend immediate entries in the load folding tables. I've added -disable-peephole to some of the affected tests from that experiment to ensure we're testing isel patterns. llvm-svn: 354511	2019-02-20 20:18:20 +00:00
Nirav Dave	c0fdd046c3	Fix testcase. llvm-svn: 354508	2019-02-20 19:26:47 +00:00
Nirav Dave	1ce977f946	Add test case. llvm-svn: 354504	2019-02-20 19:07:55 +00:00
Craig Topper	b1b2fa35ed	[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros. NFC If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits. This can prevent GetDemandedBits from recognizing that the AND is unnecessary. llvm-svn: 354501	2019-02-20 19:02:01 +00:00
Craig Topper	f4923db5a3	Revert r354498 "[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros." I accidentally committed more than just the test. llvm-svn: 354499	2019-02-20 18:47:26 +00:00
Craig Topper	f8498a615b	[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros. If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits. This can prevent GetDemandedBits from recognizing that the AND is unnecessary. llvm-svn: 354498	2019-02-20 18:45:38 +00:00

1 2 3 4 5 ...

27889 Commits