llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	012fda59a5	[AARCH64][X86] Remove _nonsplat from test names As discussed on D50222 llvm-svn: 343934	2018-10-07 11:24:04 +00:00
Simon Pilgrim	9fa1c66421	[X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion llvm-svn: 343926	2018-10-06 22:13:44 +00:00
Simon Pilgrim	0dcf1cea03	[X86][SSE] Add SSE41 vector int2fp tests llvm-svn: 343925	2018-10-06 20:24:27 +00:00
Simon Pilgrim	62d199f4e5	[X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself llvm-svn: 343922	2018-10-06 14:51:14 +00:00
Simon Pilgrim	944c530563	[X86] Regenerate LSR loop iteration test llvm-svn: 343921	2018-10-06 14:26:38 +00:00
Sanjay Patel	891be5af90	[x86] add test for masked store with extra shift op; NFC llvm-svn: 343920	2018-10-06 14:11:05 +00:00
Simon Pilgrim	0cc0a24b55	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises. llvm-svn: 343919	2018-10-06 13:49:31 +00:00
Simon Pilgrim	9c9c97bcf4	[SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts simplification This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector. There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify. As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.). Fixes PR39178 Differential Revision: https://reviews.llvm.org/D52935 llvm-svn: 343913	2018-10-06 10:20:04 +00:00
Sanjay Patel	f84ece68ca	[x86] make blend tests resistant to demanded elements improvements; NFC Similar to rL343858 - we don't want these tests to lose value with D52912. llvm-svn: 343882	2018-10-05 20:26:54 +00:00
Craig Topper	0ed892da70	[X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits. The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away. The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad. llvm-svn: 343871	2018-10-05 18:13:36 +00:00
Sanjay Patel	f6a160a102	[SelectionDAG] allow undefs when matching splat constants And use that to transform fsub with zero constant operands. The integer part isn't used yet, but it is proposed for use in D44548, so adding both enhancements here makes that patch simpler. llvm-svn: 343865	2018-10-05 17:42:19 +00:00
Sanjay Patel	8858fa8552	[x86] add test for (X - 0.0) vector with undef elts; NFC llvm-svn: 343863	2018-10-05 17:36:51 +00:00
Simon Pilgrim	90947214f3	[X86][SSE] Try to make MOVLPS/MOVHPS(+PD) instructions SimplifyDemandedElts proof Fix for D52912 which was simplifying MOVLPS/MOVHPS(+PD) instructions as the tests were only touching one of the vector halfs llvm-svn: 343858	2018-10-05 15:50:18 +00:00
Sanjay Patel	00216bca66	[x86] regenerate full checks; NFC llvm-svn: 343855	2018-10-05 14:56:14 +00:00
Sanjay Patel	b7d85655f7	[x86] add test for fneg matching failure; NFC llvm-svn: 343854	2018-10-05 14:49:20 +00:00
Simon Pilgrim	6c5ab48fe7	[X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1)) This was found necessary while investigating PR39161 llvm-svn: 343853	2018-10-05 14:41:00 +00:00
Sanjay Patel	2cf1561f1a	[x86] add test for SSE sqrtss register dep (PR22206) llvm-svn: 343803	2018-10-04 17:59:30 +00:00
Simon Pilgrim	8ba4061d39	[X86][AVX] Add PR39161 test case for v4f64 zzww shuffle llvm-svn: 343786	2018-10-04 15:06:09 +00:00
Simon Pilgrim	aabd99c27a	[X86] PUSH/POP 'mem-mem' instructions are not RMW - these are 2 different addresses This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions llvm-svn: 343708	2018-10-03 19:02:38 +00:00
Simon Pilgrim	0b451a2983	[X86][Btver2] Fix MMX PSHUFB schedule Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343701	2018-10-03 18:18:50 +00:00
Nirav Dave	925b64be64	[X86] Correctly use SSE registers if no-x87 is selected. Fix use of SSE1 registers for f32 ops in no-x87 mode. Notably, allow use of SSE instructions for f32 operations in 64-bit mode (but not 32-bit which is disallowed by callign convention). Also avoid translating memset/memcopy/memmove into SSE registers without X87 for 32-bit mode. This fixes PR38738. Reviewers: nickdesaulniers, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52555 llvm-svn: 343689	2018-10-03 14:13:30 +00:00
Roman Lebedev	ea2046bea9	[NFC][CodeGen][X86] fma.ll, lwp-intrinsics.ll: actually spell --check-prefixes correctly :/ llvm-svn: 343588	2018-10-02 13:34:50 +00:00
Roman Lebedev	5412be4b7a	[NFC][CodeGen][X86] lwp-intrinsics.ll: fix check prefixes llvm-svn: 343585	2018-10-02 13:11:08 +00:00
Roman Lebedev	8b253f0b54	[NFC][CodeGen][X86] fma.ll: fix check prefixes for -mcpu=bdver2 llvm-svn: 343584	2018-10-02 13:10:55 +00:00
Simon Pilgrim	ad23f270db	[X86] Standardize floating point assembly comments Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562	2018-10-02 09:08:51 +00:00
Craig Topper	42cd8cd862	Recommit r343499 "[X86] Enable load folding in the test shrinking code" Original message: This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 llvm-svn: 343540	2018-10-01 21:35:28 +00:00
Craig Topper	f06a57fc89	Recommit r343498 "[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated." This includes a fix to prevent i16 compares with i32/i64 ands from being shrunk if bit 15 of the and is set and the sign bit is used. Original commit message: Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. llvm-svn: 343539	2018-10-01 21:35:26 +00:00
Craig Topper	1346b5b7cf	[X86] Add more test shrinking with truncate and sign bit usage tests. NFC llvm-svn: 343519	2018-10-01 18:52:19 +00:00
Craig Topper	e072934d28	Revert r343499 and r343498. X86 test improvements There's a subtle bug in the handling of truncate from i32/i64 to i32 without minsize. I'll be adding more test cases and trying to find a fix. llvm-svn: 343516	2018-10-01 18:40:44 +00:00
Craig Topper	aa84e1bba2	[X86] Enable load folding in the test shrinking code This patch adds load folding support to the test shrinking code. This was noticed missing in the review for D52669 Differential Revision: https://reviews.llvm.org/D52699 llvm-svn: 343499	2018-10-01 17:10:50 +00:00
Craig Topper	2b587ad071	[X86] Improve test instruction shrinking when the sign flag is used and the output of the and is truncated Currently we skip looking through truncates if the sign flag is used. But that's overly restrictive. It's safe to look through the truncate as long as we ensure one of the 3 things when we shrink. Either the MSB of the mask at the shrunken size isn't set. If the mask bit is set then either the shrunk size needs to be equal to the compare size or the sign flag needs to be unused. There are still missed opportunities to shrink a load and fold it in here. This will be fixed in a future patch. Differential Revision: https://reviews.llvm.org/D52669 llvm-svn: 343498	2018-10-01 17:10:45 +00:00
Simon Pilgrim	e0d2019052	[X86][Btver2] Fix BT(C\|R\|S)mr & BT(C\|R\|S)mi schedule latency + uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343494	2018-10-01 16:31:30 +00:00
Sanjay Patel	5187efcfab	[x86] add tests for 256- and 512-bit vector types for scalar-to-vector transform; NFC llvm-svn: 343491	2018-10-01 16:17:18 +00:00
Clement Courbet	a933fb237e	[X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB. Summary: While looking at PR35606, I found out that the scheduling info is incorrect. One can check that it's really a P5+P6 and not a 2*P56 with: echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' \| ./bin/llvm-exegesis -mode=uops -snippets-file=- (vandps executes on P5 only) Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52541 llvm-svn: 343447	2018-10-01 08:37:48 +00:00
Clement Courbet	ce4caff0de	[CodeGen][NFC] Add tests for heterogeneous types in MergeConsecutiveStores Reviewers: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52643 llvm-svn: 343444	2018-10-01 07:16:22 +00:00
Craig Topper	67d9dbdbdd	[X86] Stop X86DomainReassignment from creating copies between GR8/GR16 physical registers and k-registers. We can only copy between a k-register and a GR32/GR64 register. This patch detects that the copy will be illegal and prevents the domain reassignment from happening for that closure. This probably isn't the best fix, and we should probably figure out how to handle this correctly. Fixes PR38803. llvm-svn: 343443	2018-10-01 07:08:41 +00:00
Simon Pilgrim	f21083870d	[X86] Fix scheduler class for BTmi instructions This wasn't treated as a folded load instruction llvm-svn: 343424	2018-09-30 20:19:16 +00:00
Roman Lebedev	0496477c5d	[NFC][CodeGen][X86][AArch64] Add 64-bit constant bit field extract pattern tests llvm-svn: 343404	2018-09-30 12:42:08 +00:00
Simon Pilgrim	84e280ae42	[X86] Regenerate MMX coalescing test Exposes another extractelement(bitcast(scalartovector())) pattern llvm-svn: 343403	2018-09-30 09:42:04 +00:00
Craig Topper	1709829fed	[X86] Disable BMI BEXTR in X86DAGToDAGISel::matchBEXTRFromAnd unless we're on compiling for a CPU with single uop BEXTR Summary: This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction. The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop. For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case. Reviewers: RKSimon, spatel, lebedev.ri, andreadb Reviewed By: RKSimon Subscribers: llvm-commits, andreadb, RKSimon Differential Revision: https://reviews.llvm.org/D52570 llvm-svn: 343399	2018-09-30 03:01:46 +00:00
David Bolvansky	09fd8172df	[DAGCombiner][NFC] Tests for X div/rem Y single bit fold llvm-svn: 343392	2018-09-29 21:00:37 +00:00
Simon Pilgrim	c4e7c347cd	[X86][AVX2] Cleanup shuffle combining tests - add common prefixes llvm-svn: 343391	2018-09-29 20:34:16 +00:00
Simon Pilgrim	a2efe82b81	[X86] SimplifyDemandedVectorEltsForTargetNode - remove identity target shuffles before simplifying inputs By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....). llvm-svn: 343390	2018-09-29 18:15:26 +00:00
Craig Topper	845789e823	[X86] Add fast-isel test cases for unaligned load/store intrinsics recently added to clang This adds tests for: _mm_loadu_si16 _mm_loadu_si32 _mm_loadu_si16 _mm_storeu_si64 _mm_storeu_si32 _mm_storeu_si16 llvm-svn: 343389	2018-09-29 18:03:52 +00:00
Simon Pilgrim	d633e290c8	[X86] getTargetConstantBitsFromNode - add support for rearranging constant bits via shuffles Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet. llvm-svn: 343384	2018-09-29 17:01:55 +00:00
Simon Pilgrim	43e4e648ef	[X86] Regenerate fma comments. llvm-svn: 343376	2018-09-29 14:31:00 +00:00
Simon Pilgrim	22d51014af	[X86] getTargetConstantBitsFromNode - add support for peeking through ISD::EXTRACT_SUBVECTOR llvm-svn: 343375	2018-09-29 14:17:32 +00:00
Simon Pilgrim	aa77033a6b	[X86][SSE] Fixed issue with v2i64 variable shifts on 32-bit targets The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats. llvm-svn: 343373	2018-09-29 13:25:22 +00:00
Craig Topper	98aa643420	[X86] Add test cases for failures to use narrow test with immediate instructions when a truncate is beteen the CMP and the AND and the sign flag is used. The code in X86ISelDAGToDAG only looks through truncates if the sign flag isn't used, but that is overly restrictive. A future patch will improve this. llvm-svn: 343355	2018-09-28 19:06:28 +00:00
Aditya Nandakumar	1cbb057142	[GISel]: Remove an incorrect assert in CallLowering https://reviews.llvm.org/D51147 Asserting if any extend of vectors should be up to the target's legalizer/target specific code not in CallLowering. reviewed by : dsanders. llvm-svn: 343325	2018-09-28 15:08:49 +00:00
Jonas Devlieghere	f1c414cd0d	Split invocations in CodeGen/X86/cpus.ll among multiple tests. (NFC) On GreenDragon `CodeGen/X86/cpus.ll` is timing out on the bot with Asan and UBSan enabled. With the same configuration on my machine, the test passes but takes more than 3 minutes to do so. I could increase the timeout, but I believe it makes more sense to split up the test because it allows for more parallelism. Differential revision: https://reviews.llvm.org/D52603 llvm-svn: 343313	2018-09-28 12:08:51 +00:00
Simon Pilgrim	17e5981ebf	[X86][Btver2] Fix BSF/BSR schedule Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311	2018-09-28 10:26:48 +00:00
Simon Pilgrim	280af1c7f0	[X86][BtVer2] Fix PHMINPOS schedule resources typo PHMINPOS can run on either JFPU pipe llvm-svn: 343299	2018-09-28 08:21:39 +00:00
Craig Topper	1b29615330	[X86] Add the test case from PR38986. The assembly for this test should be optimal now after changes to the ScalarizeMaskedMemIntrin patch. llvm-svn: 343281	2018-09-27 23:25:10 +00:00
Craig Topper	6911bfe263	[ScalarizeMaskedMemIntrin] When expanding masked gathers, start with the passthru vector and insert the new load results into it. Previously we started with undef and did a final merge with the passthru at the end. llvm-svn: 343273	2018-09-27 21:28:59 +00:00
Craig Topper	7d234d6628	[ScalarizeMaskedMemIntrin] When expanding masked loads, start with the passthru value and insert each conditional load result over their element. Previously we started with undef and did one final merge at the end with a select. llvm-svn: 343271	2018-09-27 21:28:52 +00:00
Craig Topper	0423681d4a	[ScalarizeMaskedMemIntrin] Don't emit 'icmp eq i1 %x, 1' to check mask values. That's just %x so use that directly. Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical. llvm-svn: 343244	2018-09-27 18:01:48 +00:00
Craig Topper	e4c96f4a48	[X86] Update tzcnt fast-isel tests to match clang r343126. We now generate cttz with the zero_undef flag set to false. This allows -O0 to avoid the zero check. llvm-svn: 343127	2018-09-26 17:19:28 +00:00
Simon Pilgrim	26223bccde	[X86][SSE] Refresh PR34947 test code to handle D52504 The previously reduced version used urem <9 x i32> zeroinitializer, %tmp which D52504 will simplify. llvm-svn: 343097	2018-09-26 11:53:51 +00:00
Simon Pilgrim	5beaac433d	[X86][SSE] Use ISD::MULHS for constant vXi16 ISD::SRA lowering (PR38151) Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS. As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero. Differential Revision: https://reviews.llvm.org/D52171 llvm-svn: 343093	2018-09-26 10:57:05 +00:00
Yury Gribov	67572004df	Fixes removal of dead elements from PressureDiff (PR37252). Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 343090	2018-09-26 10:42:41 +00:00
Craig Topper	12c18840fa	[X86] Allow movmskpd/ps ISD nodes to be created and selected with integer input types. This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there. But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way. llvm-svn: 343046	2018-09-25 23:28:27 +00:00
Craig Topper	d8c68840c8	[X86] Add some more movmsk test cases. NFC These IR patterns represent the exact behavior of a movmsk instruction using (zext (bitcast (icmp slt X, 0))). For the v4i32/v8i32/v2i64/v4i64 we currently emit a PCMPGT for the icmp slt which is unnecessary since we only care about the sign bit of the result. This is because of the int->fp bitcast we put on the input to the movmsk nodes for these cases. I'll be fixing this in a future patch. llvm-svn: 343045	2018-09-25 23:28:24 +00:00
Sanjay Patel	10c11b867a	[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449) This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008	2018-09-25 19:09:34 +00:00
Craig Topper	6fb1358a98	[X86] Add AVX512 support to combineVectorSizedSetCCEquality. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52424 llvm-svn: 342989	2018-09-25 16:27:12 +00:00
Simon Pilgrim	b56be79e0c	Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969	2018-09-25 13:01:26 +00:00
Craig Topper	9ce5da7b62	[X86] Don't create FILD ISD nodes when X87 is disabled. The included test case previously asserted because the type legalizer tried to soften the FILD ISD node. Fixes PR38819. llvm-svn: 342934	2018-09-25 00:16:57 +00:00
Christy Lee	e94374809e	Re-submitting changes in D51550 because it failed to patch. Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919	2018-09-24 20:47:12 +00:00
Simon Pilgrim	0b4ad7596f	[X86] Remove shift/rotate by CL memory (RMW) overrides The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916	2018-09-24 20:11:50 +00:00
Simon Pilgrim	51cbd838d0	[X86][AVX] Add truncation as shuffle test for PR31451 llvm-svn: 342908	2018-09-24 17:26:31 +00:00
Simon Pilgrim	00865a48d1	[X86] Split WriteIMul into 8/16/32/64 implementations (PR36931) Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892	2018-09-24 15:21:57 +00:00
Sanjay Patel	2c901742ca	[DAGCombiner] use UADDO to optimize saturated unsigned add This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886	2018-09-24 14:47:15 +00:00
Roman Lebedev	fb697d0f1b	[NFC][CodeGen][X86][AArch64] More tests for 'bit field extract' w/ constants It would be best to introduce ISD::BitFieldExtract, because clearly more than one backend faces the same problem. But for now let's solve this in the x86-specific DAG combine. https://bugs.llvm.org/show_bug.cgi?id=38938 llvm-svn: 342880	2018-09-24 13:24:20 +00:00
Craig Topper	2b8107614c	[X86] Add 512-bit test cases to setcc-wide-types.ll. NFC llvm-svn: 342860	2018-09-24 05:46:01 +00:00
Sanjay Patel	0027946915	[DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844	2018-09-23 18:41:38 +00:00
Simon Pilgrim	19952add7c	[X86] Added missing RCL/RCR schedule overrides to the generic SNB model The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults. I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well. This is necessary to allow WriteRotate to be updated to remove other rotate overrides. It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions. llvm-svn: 342842	2018-09-23 17:40:24 +00:00
Sanjay Patel	151efca3fe	[x86] add tests for mul decomposition with negative constant; NFC llvm-svn: 342838	2018-09-23 16:07:46 +00:00
Craig Topper	c296436a30	[X86] Add isel pattern for (v8i16 (sext (v8i1))) with DQI and no BWI. Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate. The pattern needs to extend to an v8i32 then truncate back down to v8i16. llvm-svn: 342830	2018-09-23 06:49:48 +00:00
Craig Topper	082e04c61d	[X86] Fix inline expansion for memset in x32 Summary: Similar to D51893 which was for memcpy Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52063 llvm-svn: 342796	2018-09-22 05:16:35 +00:00
Craig Topper	9995760df4	[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) for vXi8 vectors. We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove. llvm-svn: 342795	2018-09-22 05:08:38 +00:00
Craig Topper	ecdab03d10	[X86] Teach fast isel to use MOV32ri64 for loading an unsigned 32 immediate into a 64-bit register. Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same. llvm-svn: 342788	2018-09-21 23:14:05 +00:00
Sanjay Patel	db1fb8cd20	[x86] add more tests for poetntial andnp splitting with AVX1; NFC llvm-svn: 342775	2018-09-21 21:25:16 +00:00
Simon Pilgrim	eabc582b62	[X86] Add AVX512 target to load scalar to vector tests To investigate broadcast instruction codegen for D51553 llvm-svn: 342773	2018-09-21 21:08:26 +00:00
Sanjay Patel	19b5eb580b	[x86] add (negative) andnp test for D52318; NFC llvm-svn: 342756	2018-09-21 18:24:53 +00:00
Sanjay Patel	83a1b66c43	[x86] add test with optsize attribute for scalar->vector transform; NFC llvm-svn: 342755	2018-09-21 18:03:49 +00:00
Clement Courbet	8171bd8e0f	[X86][Sched] Add zero idiom sched data to the SNB model. Summary: On SNB, renamer-based zeroing does not work for: - 16 and 8-bit GPRs[1]. - MMX [2]. - ANDN variants [3] [1] echo 'sub %ax, %ax' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- [2] echo 'pxor %mm0, %mm0' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- [3] echo 'andnps %xmm0, %xmm0' \| /tmp/llvm-exegesis -mode=uops -snippets-file=- Reviewers: RKSimon, andreadb Subscribers: gbedwell, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52358 llvm-svn: 342736	2018-09-21 14:07:20 +00:00
Andrea Di Biagio	4cd5cf9fc8	[X86][BtVer2] Fix latency and resource cycles of AVX 256-bit zero-idioms. This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S\|D)Yrr and VANDNP(S\|D)Yrr. This is a follow-up of r342555. On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at register-renaming stage. The other opcode has to be executed to set the upper half of the destination YMM. Same for VANDNP(S\|D)Yrr. Differential Revision: https://reviews.llvm.org/D52347 llvm-svn: 342728	2018-09-21 12:43:07 +00:00
Andrea Di Biagio	eebfecee4f	[X86] Add scheduling tests for AVX1 256-bit zero-idioms. NFC llvm-svn: 342726	2018-09-21 12:22:14 +00:00
Walter Lee	f75e803679	[RegAllocGreedy] Fix crash in tryLocalSplit tryLocalSplit only handles a single use block, but an interval may have multiple use blocks. So don't crash in that case. This fixes PR38795. Differential revision: https://reviews.llvm.org/D52277 llvm-svn: 342682	2018-09-20 20:05:57 +00:00
Simon Pilgrim	6f630e4618	Fix line-endings. NFCI. llvm-svn: 342639	2018-09-20 10:59:08 +00:00
Roman Lebedev	38c25ace53	[NFC][x86][AArch64] Add BEXTR-like test patterns. Summary: Also, adjust the check prefixes so that we actually get to check the BMI1-only-case. Reviewers: craig.topper, RKSimon, spatel, javed.absar Reviewed By: RKSimon Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48490 llvm-svn: 342623	2018-09-20 07:54:49 +00:00
Sanjay Patel	0bda919870	[x86] add test for 256-bit andn (PR37749); NFC llvm-svn: 342595	2018-09-19 22:00:56 +00:00
Sanjay Patel	fdc0de19cb	[SelectionDAG] allow vector types with isBitwiseNot() The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits, and the test diff in add.ll is from a DAGCombiner transform. llvm-svn: 342594	2018-09-19 21:48:30 +00:00
Sanjay Patel	567f7f0a95	[x86] add test for add+not vector fold; NFC The fold uses 'isBitwiseNot()', but that's not vector-friendly currently. llvm-svn: 342592	2018-09-19 21:28:39 +00:00
Simon Pilgrim	2d0f20cc04	[X86] Handle COPYs of physregs better (regalloc hints) Enable enableMultipleCopyHints() on X86. Original Patch by @jonpa: While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling. Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates. Differential Revision: https://reviews.llvm.org/D38128 llvm-svn: 342578	2018-09-19 18:59:08 +00:00
Michael Berg	894c39f770	Copy utilities updated and added for MI flags Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path. Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw. Reviewers: spatel, wristow, arsenm Reviewed By: arsenm Subscribers: wdng Differential Revision: https://reviews.llvm.org/D52006 llvm-svn: 342576	2018-09-19 18:52:08 +00:00
Simon Pilgrim	8191d63c3b	[X86] Add initial SimplifyDemandedVectorEltsForTargetNode support This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles. Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity. Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc. NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification. Differential Revision: https://reviews.llvm.org/D52140 llvm-svn: 342564	2018-09-19 18:11:34 +00:00
Sanjay Patel	4fd2e2a498	[DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554	2018-09-19 15:57:40 +00:00
Simon Pilgrim	e2b16389e7	[X86][SSE] Update extractelement test in preparation for D52140 SimplifyDemandedVectorEltsForTargetNode will remove most of this test unless get rid of the undefs - still testing for align 1 which was the point of the test Removed out of date comment as well llvm-svn: 342531	2018-09-19 09:50:32 +00:00
Carlos Alberto Enciso	ba4e437c6a	[DebugInfo][Dexter] Speculated BB presents illegal variable value to debugger. When SimplifyCFG changes the PHI node into a select instruction, the debug information becomes ambiguous. It causes the debugger to display wrong variable value. Differential Revision: https://reviews.llvm.org/D51976 llvm-svn: 342527	2018-09-19 08:16:56 +00:00
Hans Wennborg	01c3154971	Revert r342457 "Fixes removal of dead elements from PressureDiff (PR37252)." This broke the lit tests on a bunch of buildbots, e.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36679 > Reviewed By: MatzeB > > Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 342482	2018-09-18 14:12:54 +00:00
Yury Gribov	53db663afb	Fixes removal of dead elements from PressureDiff (PR37252). Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51495 llvm-svn: 342457	2018-09-18 09:53:42 +00:00
Keno Fischer	c8ccaed325	[X86ISel] Implement byval lowering for Win64 calling convention Summary: The IR reference for the `byval` attribute states: ``` This indicates that the pointer parameter should really be passed by value to the function. The attribute implies that a hidden copy of the pointee is made between the caller and the callee, so the callee is unable to modify the value in the caller. This attribute is only valid on LLVM pointer arguments. ``` However, on Win64, this attribute is unimplemented and the raw pointer is passed to the callee instead. This is problematic, because frontend authors relying on the implicit hidden copy (as happens for every other calling convention) will see the passed value silently (if mutable memory) or loudly (by means of a crash) modified because the callee treats the location as scratch memory space it is allowed to mutate. At this point, it's worth taking a step back to understand the context. In most calling conventions, aggregates that are too large to be passed in registers, instead get copied to the stack at a fixed (computable from the signature) offset of the stack pointer. At the LLVM, we hide this hidden copy behind the byval attribute. The caller passes a pointer to the desired data and the callee receives a pointer, but these pointers are not the same. In particular, the pointer that the callee receives points to temporary stack memory allocated as part of the call lowering. In most calling conventions, this pointer is never realized in registers or memory. The temporary memory is simply defined by an implicit offset from the stack pointer at function entry. Win64, uniquely, works differently. The structure is still passed in memory, but instead of being stored at an implicit memory offset, the caller computes a pointer to the temporary memory and passes it to the callee as a regular pointer (taking up a register, or if all registers are taken up, an additional stack slot). Presumably, this was done to allow eliding the copy when passing aggregates through several functions on the stack. This explains why ignoring the `byval` attribute mostly works on Win64. The argument simply gets passed as a pointer and as long as we're ok with the callee trampling all over that memory, there are no ill effects. However, it does contradict the documentation of the `byval` attribute which specifies that there is to be an implicit copy. Frontends can of course work around this by never emitting the `byval` attribute for Win64 and creating `alloca`s for the requisite temporary stack slots (and that does appear to be what frontends are doing). However, the presence of the `byval` attribute is not a trap for frontend authors, since it seems to work, but silently modifies the passed memory contrary to documentation. I see two solutions: - Disallow the `byval` attribute in the verifier if using the Win64 calling convention. - Make it work by simply emitting a temporary stack copy as we would with any other calling convention (frontends can of course always not use the attribute if they want to elide the copy). This patch implements the second option (make it work), though I would be fine with the first also. Ref: https://github.com/JuliaLang/julia/issues/28338 Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51842 llvm-svn: 342402	2018-09-17 17:37:14 +00:00
Amara Emerson	91c2913522	Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397	2018-09-17 14:40:13 +00:00
Simon Pilgrim	cffa206423	[X86][SSE] Always enable ISD::SRL -> ISD::MULHU for v8i16 For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering. llvm-svn: 342352	2018-09-16 20:28:38 +00:00
Simon Pilgrim	ea069ffd44	[X86][AVX] Enable ISD::SRL -> ISD::MULHU for v16i16 Now that rL340913 has landed with improved v16i16 selects as shuffles. llvm-svn: 342349	2018-09-16 19:20:47 +00:00
Sanjay Patel	3eaf500a6d	[DAGCombiner] try to convert pow(x, 1/3) to cbrt(x) This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348	2018-09-16 16:50:26 +00:00
Sanjay Patel	bfee5a9b42	[x86] fix uses check in broadcast transform (PR38949) https://bugs.llvm.org/show_bug.cgi?id=38949 It's not clear to me that we even need a one-use check in this fold. Ie, 2 independent loads might be better than a load+dependent shuffle. Note that the existing re-use tests are not affected. We actually do form a broadcast node in those tests now because there's no extra use of the insert_subvector node in those cases. But something later in isel pattern matching decides that it is not worth using a broadcast for the full load in those tests: Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:' t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0> t20: v4f64 = insert_subvector t18, t7, Constant:i64<2> Becomes: t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64 t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7 ... Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7> Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1> llvm-svn: 342347	2018-09-16 15:41:56 +00:00
Sanjay Patel	3e095174b0	[x86] add failure to splat test (PR38949); NFC llvm-svn: 342346	2018-09-16 14:59:04 +00:00
Simon Pilgrim	fc4c26485c	[X86][SSE] Fix insertps load combine test name The existing test was called extract_lane_insertps_5123 but it was in fact doing a <6,1,2,3> shuffle. I've fixed the name and added the <5,1,2,3> test case as well. llvm-svn: 342328	2018-09-15 16:57:04 +00:00
Craig Topper	fe0b973fbf	[X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64. Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327	2018-09-15 16:23:35 +00:00
Craig Topper	273f755da3	[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) Summary: MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load. Inspired by PR38840. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52121 llvm-svn: 342326	2018-09-15 16:23:33 +00:00
Simon Pilgrim	7bfe87181d	Fix line endings. NFCI. llvm-svn: 342323	2018-09-15 14:20:53 +00:00
Reid Kleckner	4d1b75c6b7	Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265	2018-09-14 19:39:40 +00:00
Simon Pilgrim	4c30f3d4e6	Revert a line-endings change that somehow got included with rL342257 llvm-svn: 342258	2018-09-14 18:35:21 +00:00
Simon Pilgrim	32857c54d2	[X86][SSE] Lower shuffles to permute(unpack(x,y)) (PR31151) Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation. As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work. Differential Revision: https://reviews.llvm.org/D52043 llvm-svn: 342257	2018-09-14 18:33:31 +00:00
Craig Topper	ac356cac0c	[X86] Re-generate test checks using current version of the script. NFC The regular expression used for stack accesses is different today. llvm-svn: 342256	2018-09-14 18:27:09 +00:00
Simon Pilgrim	1c1335a10d	[X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2 These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64. llvm-svn: 342235	2018-09-14 13:31:14 +00:00
Amara Emerson	ef600cbd86	[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index. Differential Revision: https://reviews.llvm.org/D51831 llvm-svn: 342183	2018-09-13 21:28:58 +00:00
Craig Topper	2f88006ced	[MachineInstr] In addRegisterKilled and addRegisterDead, don't remove operands from inline assembly instructions if they have an associated flag operand. INLINEASM instructions use extra operands to carry flags. If a register operand is removed without removing the flag operand, then the flags will no longer make sense. This patch fixes this by preventing the removal when a flag operand is present. The included test case was generated by MS inline assembly. Longer term maybe we should fix the inline assembly parsing to not generate redundant operands. Differential Revision: https://reviews.llvm.org/D51829 llvm-svn: 342176	2018-09-13 20:51:27 +00:00
Nirav Dave	59ad1c8457	[X86] Fix register resizings for inline assembly register operands. When replacing a named register input to the appropriately sized sub/super-register. In the case of a 64-bit value being assigned to a register in 32-bit mode, match GCC's assignment. Reviewers: eli.friedman, craig.topper Subscribers: nickdesaulniers, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D51502 llvm-svn: 342175	2018-09-13 20:33:56 +00:00
Craig Topper	f107123a88	[X86] Type legalize v2i32 div/rem by scalarizing rather than promoting Summary: Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall. This patch switches type legalization to do the scalarizing itself using i32. It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support. Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51325 llvm-svn: 342114	2018-09-13 06:13:37 +00:00
Michael Berg	22a53cbc7f	Guard FMF context by excluding some FP operators from FPMathOperator Summary: Some FPMathOperators succeed and the retrieve FMF context when they never have it, we should omit these cases to keep from removing FMF context. For instance when we visit some FPMathOperator mapped Instructions which never have FMF flags and a Node was associated which does have FMF flags, that Node today will have all its flags cleared via the intersect operation. With this change, we exclude associating Nodes that never have FPMathOperator status under FMF. Reviewers: spatel, wristow, arsenm, hfinkel, aemerson Reviewed By: spatel Subscribers: llvm-commits, wdng Differential Revision: https://reviews.llvm.org/D51145 llvm-svn: 342081	2018-09-12 21:09:59 +00:00
Craig Topper	2262613532	[X86] Remove isel patterns for ADCX instruction There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags. I don't think gcc will generate this instruction either. Fixes PR38852. Differential Revision: https://reviews.llvm.org/D51754 llvm-svn: 342059	2018-09-12 15:47:34 +00:00
Craig Topper	dc32e91bc6	[X86] Teach X86SelectionDAGInfo::EmitTargetCodeForMemcpy about GNUX32 Summary: In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match. Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger. Reviewers: rnk, efriedma, MatzeB, echristo Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51893 llvm-svn: 342015	2018-09-12 01:57:22 +00:00
Craig Topper	8238580aae	[X86] Prefer unpckhpd over movhlps in isel for fake unary cases In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order. This patch removes the hack. This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage. Differential Revision: https://reviews.llvm.org/D49499 llvm-svn: 341973	2018-09-11 17:57:27 +00:00
Craig Topper	cc9efaffad	[X86] Teach X86FastISel::X86SelectRet to use EAX for the sret pointer in GNUX32 GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value. Fixes ones of the failures from PR38865. Differential Revision: https://reviews.llvm.org/D51940 llvm-svn: 341972	2018-09-11 17:57:23 +00:00
Roman Lebedev	baf2628043	[DagCombine][NFC] Some more tests fo for X % C == 0 (UREM case) transform For https://reviews.llvm.org/D50222 Patch by: hermord (Dmytro Shynkevych)! llvm-svn: 341953	2018-09-11 15:34:26 +00:00
Craig Topper	844f035e1e	[X86] In combineMOVMSK, look through int->fp bitcasts before callling SimplifyDemandedBits. MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops. llvm-svn: 341915	2018-09-11 08:20:02 +00:00
Craig Topper	85210311ba	[X86] Add test cases inspired by PR38840. These are test cases inspired by sequences like below for extracting the same bit from every vector element and checking for all zeros/ones. define i1 @and256_x8(<8 x i32>) { %a = trunc <8 x i32> %0 to <8 x i1> %b = bitcast <8 x i1> %a to i8 %d = icmp eq i8 %b, -1 ret i1 %d } This is what the above looks like after InstCombine. define i1 @and256_x8_opt(<8 x i32>) { %2 = and <8 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1> %a = icmp ne <8 x i32> %2, zeroinitializer %b = bitcast <8 x i1> %a to i8 %d = icmp eq i8 %b, -1 ret i1 %d } llvm-svn: 341908	2018-09-11 07:23:29 +00:00
Craig Topper	07889079fa	[X89] Explicitly enable aes in aes-schedule.ll to fix failures after r341861. llvm-svn: 341868	2018-09-10 21:49:01 +00:00
Sanjay Patel	7feb3ed78c	[x86] test codegen for unsigned saturated add; NFC All of the ISA holes are going to make this difficult, but we can't canonicalize the IR and try to solve PR14613 until we have backend support to get this right. https://bugs.llvm.org/show_bug.cgi?id=14613 https://rise4fun.com/Alive/Guv https://rise4fun.com/Alive/AADG llvm-svn: 341845	2018-09-10 17:40:15 +00:00
Craig Topper	3823516103	[X86] Custom type legalize (v2i32 (fp_to_uint v2f64))) without avx512vl by widening to v4i32 and v4f64 instead of v8i32 and v8f64. Make it aware of x86-experimental-vector-widening-legalization We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that. If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case. llvm-svn: 341765	2018-09-09 20:36:36 +00:00
Sanjay Patel	6ebf218e4c	[SelectionDAG] enhance vector demanded elements to look at a vector select condition operand This is the DAG equivalent of D51433. If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition. The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit vectors because we don't need those anyway. I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed to be running there? Differential Revision: https://reviews.llvm.org/D51696 llvm-svn: 341762	2018-09-09 14:13:22 +00:00
Craig Topper	7af5e333e7	[X86] Create paddus/psubus from narrower vectors with i8/i16 element types. Summary: This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG. Another step towards fixing PR38691 Reviewers: RKSimon, spatel Reviewed By: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51818 llvm-svn: 341753	2018-09-08 19:32:58 +00:00
Craig Topper	a2c9694bc8	[X86] Mark the ADCX and ADOX instruction as commutable. llvm-svn: 341752	2018-09-08 18:47:56 +00:00
Craig Topper	4677110348	[X86] Add test cases for commuting ADCX/ADOX instruction to avoid copies. This is a MIR test so we can test ADOX which we have no isel patterns for. I also plan to remove ADCX isel patterns in the near future so this will help maintain coverage. llvm-svn: 341751	2018-09-08 18:47:54 +00:00
Craig Topper	c96305970d	[X86] Add commuted isel pattern for the load form of ADCX instructions. This prevents the legacy ADC instruction from being favored over ADCX when the load is in the operand 0. llvm-svn: 341745	2018-09-08 06:31:43 +00:00
Craig Topper	22a6f51646	[X86] Add load folding test cases for the addcarryx intrinsic. We are currently only able to fold a load in operand 1 to ADCX. A load in operand 0 will use the legacy ADC instruction. Ultimately I want to remove isel support for ADCX, but first I'm going to fix the shortcomings I know of so I can write proper MIR tests to maintain coverage later. llvm-svn: 341744	2018-09-08 06:31:41 +00:00
Craig Topper	761e88d1d4	[X86] Add stack folding MIR test for ADCX/ADOX. We currently have no way to isel ADOX and I plan to remove isel patterns for ADCX. This test will ensure we still have stack folding support for these instructions if we need them in the future. llvm-svn: 341743	2018-09-08 05:08:18 +00:00
Reid Kleckner	f803b23879	[COFF] Implement llvm.global_ctors priorities for MSVC COFF targets Summary: MSVC and LLD sort sections ASCII-betically, so we need to use section names that sort between .CRT$XCA (the start) and .CRT$XCU (the default priority). In the general case, use .CRT$XCT12345 as the section name, and let the linker sort the zero-padded digits. Users with low priorities typically want to initialize as early as possible, so use .CRT$XCA00199 for prioties less than 200. This number is arbitrary. Implements PR38552. Reviewers: majnemer, mstorsjo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51820 llvm-svn: 341727	2018-09-07 23:07:55 +00:00
Craig Topper	fa535c027e	[X86] Add codegen tests for narrow PADDUS/PSUBUS patterns for PR38691. llvm-svn: 341711	2018-09-07 21:28:46 +00:00
Craig Topper	5cbce81c91	[X86] Don't create ZERO_EXTEND_INREG/SIGN_EXTEND_INREG for v1iX vectors. The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that. llvm-svn: 341705	2018-09-07 20:56:03 +00:00
Craig Topper	39f48fdcbc	[X86] Don't create X86ISD::AVG nodes from v1iX vectors. The type legalizer will try to scalarize this and fail. It looks like there's some other v1iX oddities out there too since we still generated some vector instructions. llvm-svn: 341704	2018-09-07 20:56:01 +00:00
Craig Topper	4863313b35	[X86] Modify the the rdtscp intrinsic to return values instead of taking a pointer argument Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded. Differential Revision: https://reviews.llvm.org/D51803 llvm-svn: 341698	2018-09-07 19:14:15 +00:00
Craig Topper	72964ae99e	[X86] Change the addcarry and subborrow intrinsics to return 2 results and remove the pointer argument. We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address. Differential Revision: https://reviews.llvm.org/D51769 llvm-svn: 341677	2018-09-07 16:58:39 +00:00
Craig Topper	51e11788a4	[X86] Use regular expressions to make test immune to register allocation changes. llvm-svn: 341676	2018-09-07 16:58:36 +00:00
Craig Topper	313d09af51	[X86] Teach X86DAGToDAGISel::foldLoadStoreIntoMemOperand to handle loads in operand 1 of commutable operations. Previously we only handled loads in operand 0, but nothing guarantees the load will be operand 0 for commutable operations. Differential Revision: https://reviews.llvm.org/D51768 llvm-svn: 341675	2018-09-07 16:27:55 +00:00
Simon Pilgrim	04d0748417	[X86][SSE] Add additional fadd/fsub(x, bitcast_fneg(y)) tests with different integer bitwidths llvm-svn: 341657	2018-09-07 13:27:07 +00:00
Simon Pilgrim	96d6b9c2e2	[DAGCombiner] foldBitcastedFPLogic - Add basic vector support Add support for bitcasts from float type to an integer type of the same element bitwidth. There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet. llvm-svn: 341652	2018-09-07 12:13:45 +00:00
Simon Pilgrim	a2aef22a72	[X86][SSE] Add fadd/fsub(x, bitcast_fneg(y)) tests Show missing vector support llvm-svn: 341650	2018-09-07 11:24:43 +00:00
Craig Topper	30e129f256	[X86] Add more test cases for missed opportunities for using RMW form of ADC. llvm-svn: 341630	2018-09-07 02:39:56 +00:00
Craig Topper	2c9dede9cb	[X86] Add RMW ADC patterns with load in operand 1. ADC is commutable and the load could be in either operand, but we were only checking operand 0. Ideally we'd mark X86adc_flag as commutable and tablegen would automatically do this, but the EFLAGS register mention is preventing it. llvm-svn: 341606	2018-09-06 23:55:36 +00:00
Craig Topper	37d68e4599	[X86] Add a test case showing failure to use the RMW form of ADC when the load is in operand 1 going into isel. The ADC instruction is commutable, but we only have RMW isel patterns with a load on the left hand side. Nothing will canonicalize loads to the LHS on these ops. So we need two patterns. llvm-svn: 341605	2018-09-06 23:55:34 +00:00
Sanjay Patel	9e5c163154	[x86] add tests for pow --> cbrt; NFC llvm-svn: 341575	2018-09-06 18:42:55 +00:00
Craig Topper	5a53760f65	[X86][Assembler] Allow %eip as a register in 32-bit mode for .cfi directives. This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode. Fixes PR38826. Patch by Iain Sandoe. llvm-svn: 341512	2018-09-06 02:03:14 +00:00
Sanjay Patel	dbf52837fe	[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)) This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481	2018-09-05 17:01:56 +00:00
Chandler Carruth	664aa868f5	[x86/SLH] Add a real Clang flag and LLVM IR attribute for Speculative Load Hardening. Wires up the existing pass to work with a proper IR attribute rather than just a hidden/internal flag. The internal flag continues to work for now, but I'll likely remove it soon. Most of the churn here is adding the IR attribute. I talked about this Kristof Beyls and he seemed at least initially OK with this direction. The idea of using a full attribute here is that we do expect at least some forms of this for other architectures. There isn't anything inherently x86-specific about this technique, just that we only have an implementation for x86 at the moment. While we could potentially expose this as a Clang-level attribute as well, that seems like a good question to defer for the moment as it isn't 100% clear whether that or some other programmer interface (or both?) would be best. We'll defer the programmer interface side of this for now, but at least get to the point where the feature can be enabled without relying on implementation details. This also allows us to do something that was really hard before: we can enable just the indirect call retpolines when using SLH. For x86, we don't have any other way to mitigate indirect calls. Other architectures may take a different approach of course, and none of this is surfaced to user-level flags. Differential Revision: https://reviews.llvm.org/D51157 llvm-svn: 341363	2018-09-04 12:38:00 +00:00
Chandler Carruth	219888d1b2	[x86/SLH] Teach SLH to harden against the "ret2spec" attack by implementing the proposed mitigation technique described in the original design document. The idea is to check after calls that the return address used to arrive at that location is in fact the correct address. In the event of a mis-predicted return which reaches a valid return but not the correct return, this will detect the mismatch much like it would a mispredicted conditional branch. This is the last published attack vector that I am aware of in the Spectre v1 space which is not mitigated by SLH+retpolines. However, don't read too much into that: this is an area of ongoing research where we expect more issues to be discovered in the future, and it also makes no attempt to mitigate Spectre v4. Still, this is an important completeness bar for SLH. The change here is of course delightfully simple. It was predicated on cutting support for post-instruction symbols into LLVM which was not at all simple. Many thanks to Hal Finkel, Reid Kleckner, and Justin Bogner who helped me figure out how to do a bunch of the complex changes involved there. Differential Revision: https://reviews.llvm.org/D50837 llvm-svn: 341358	2018-09-04 10:59:10 +00:00
Chandler Carruth	8d8489f513	[x86/SLH] Teach SLH to harden indirect branches and switches without retpolines. This implements the core design of tracing the intended target into the target, checking it, and using that to update the predicate state. It takes advantage of a few interesting aspects of SLH to make it a bit easier to implement: - We already split critical edges with conditional branches, so we can assume those are gone. - We already unfolded any memory access in the indirect branch instruction itself. I've left hard errors in place to catch if any of these somewhat subtle invariants get violated. There is some code that I can factor out and share with D50837 when it lands, but I didn't want to couple landing the two patches, so I'll do that in a follow-up cleanup commit if alright. Factoring out the code to handle different scenarios of materializing an address remains frustratingly hard. In a bunch of cases you want to fold one of the cases into an immediate operand of some other instruction, and you also have both symbols and basic blocks being used which require different methods on the MI builder (and different operand kinds). Still, I'll take a stab at sharing at least some of this code in a follow-up if I can figure out how. Differential Revision: https://reviews.llvm.org/D51083 llvm-svn: 341356	2018-09-04 10:44:21 +00:00
Sanjay Patel	0945959869	[AArch64][x86] add tests for pow(x, 0.25); NFC Folds for this were proposed in D49306, but we decided the transform is better suited for the backend. llvm-svn: 341341	2018-09-03 22:11:47 +00:00
Carlos Alberto Enciso	eaf2c1f449	Test commit. Revert change done in r341297. NFC. Differential Revision: https://reviews.llvm.org/D51583 llvm-svn: 341302	2018-09-03 09:41:43 +00:00
Carlos Alberto Enciso	f03e049234	Test commit - adding a new line. llvm-svn: 341297	2018-09-03 08:26:37 +00:00
Roman Lebedev	d7a6244475	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted pattern Summary: A follow-up for D49266 / rL337166 + D49497 / rL338044. This is still the same pattern to check for the [lack of] signed truncation, but in this case the constants and the predicate are negated. https://rise4fun.com/Alive/BDV https://rise4fun.com/Alive/n7Z Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51532 llvm-svn: 341287	2018-09-02 13:56:22 +00:00
Craig Topper	caf6672779	[X86] Add intrinsics for KTEST instructions. These intrinsics use the same implementation as PTEST intrinsics, but use vXi1 vectors. New clang builtins will be accompanying them shortly. llvm-svn: 341259	2018-08-31 21:31:53 +00:00
Craig Topper	b7bb9f0078	[X86] Add support for turning vXi1 shuffles into KSHIFTL/KSHIFTR. This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is. This will be used to enable kshift intrinsics in clang. Differential Revision: https://reviews.llvm.org/D51401 llvm-svn: 341227	2018-08-31 17:17:21 +00:00
Alexander Ivchenko	9d053074a1	[GlobalISel][X86] Add the support for G_FPTRUNC Differential Revision: https://reviews.llvm.org/D49855 llvm-svn: 341202	2018-08-31 11:26:51 +00:00
Alexander Ivchenko	9b0b492653	[GlobalISel][X86_64] Support for G_FPTOSI Differential Revision: https://reviews.llvm.org/D49183 llvm-svn: 341200	2018-08-31 11:16:58 +00:00
Alexander Ivchenko	58a5d6fde7	[GlobalIsel][X86] Support for llvm.trap intrinsic Differential Revision: https://reviews.llvm.org/D49180 llvm-svn: 341199	2018-08-31 11:05:13 +00:00
Alexander Ivchenko	a26a364e75	[GlobalIsel][X86] Support for G_FCMP Differential Revision: https://reviews.llvm.org/D49172 llvm-svn: 341193	2018-08-31 09:38:27 +00:00
Roman Lebedev	75c2961b76	[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern.[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern. llvm-svn: 341188	2018-08-31 08:52:03 +00:00
Andrea Di Biagio	b998eae2f2	[X86][BtVer2] Fix WriteFShuffle256 schedule write info. This patch fixes the number of micro opcodes, and processor resource cycles for the following AVX instructions: vinsertf128rr/rm vperm2f128rr/rm vbroadcastf128 Tests have been regenerated using the usual scripts in the llvm/utils directory. Differential Revision: https://reviews.llvm.org/D51492 llvm-svn: 341185	2018-08-31 08:30:47 +00:00
Craig Topper	7073f03f70	[X86] Add a -x86-experimental-vector-widening command line to vec_fp_to_int.ll. llvm-svn: 341173	2018-08-31 07:05:38 +00:00
Craig Topper	2140a8e307	[X86] Add -x86-experimental-vector-widening-legalization run line to avx512-cvt.ll This will cover the (v2i32 (setcc v2f32)) case in replaceNodeResults. That code shouldn't be needed at all in this mode. A future patch will skip it. llvm-svn: 341171	2018-08-31 07:05:36 +00:00
Michael Berg	7b9e86445c	[NFC] adding initial intersect test for Node to Instruction association llvm-svn: 341138	2018-08-30 22:43:34 +00:00
Craig Topper	b5de35a5ba	[X86] Add -x86-experimental-vector-widening-legalization command lines to vector-idiv-v2i32.ll If we're legalizing via widening already, then the type legalizer will scalarize the divs/rems as i32. llvm-svn: 341108	2018-08-30 20:10:10 +00:00
Craig Topper	1a8c99e670	[X86] Weaken an overly aggressive assert. This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded. Fixes PR38771 llvm-svn: 341102	2018-08-30 19:35:38 +00:00
Craig Topper	b7e14332ea	[X86] Add kshift test cases for D51401. NFC llvm-svn: 341088	2018-08-30 17:51:02 +00:00
Vladimir Stefanovic	7e58ebf6b8	Allow inconsistent offsets for 'noreturn' basic blocks when '-verify-cfiinstrs' With r295105, some 'noreturn' blocks (those that don't return and have no successors) may be merged. If such blocks' predecessors have different outgoing offset or register, don't report an error in CFIInstrInserter verify(). Thanks to Vlad Tsyrklevich for reporting the issue. Differential Revision: https://reviews.llvm.org/D51161 llvm-svn: 341087	2018-08-30 17:31:38 +00:00
Roman Lebedev	26a1836757	[NFC][CodeGen][SelectionDAG] Tests for X % C == 0 codegen improvement. Hacker's Delight 10-17: when C is constant, the result of X % C == 0 can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Patch by: hermord (Dmytro Shynkevych)! For https://reviews.llvm.org/D50222 llvm-svn: 341047	2018-08-30 09:32:21 +00:00
Carlos Alberto Enciso	06adfa1718	[DWARF] Missing location debug information with -O2. Check that Machine CSE correctly handles during the transformation, the debug location information for local variables. Differential Revision: https://reviews.llvm.org/D50887 llvm-svn: 341025	2018-08-30 07:17:41 +00:00
Andrew V. Tischenko	62f7a3207b	[X86] Improved sched model for X86 CMPXCHG* instructions. Differential Revision: https://reviews.llvm.org/D50070 llvm-svn: 341024	2018-08-30 06:26:00 +00:00
Craig Topper	b7b353be60	[X86] Make Feature64Bit useful We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit. I've changed the assert to a report_fatal_error so it's not lost in Release builds. The test updates are to fix things that tripped the new error. Differential Revision: https://reviews.llvm.org/D51231 llvm-svn: 341022	2018-08-30 06:01:05 +00:00
Craig Topper	987ef2ddfd	[X86] Update test command line to not use 64-bit mode on a 32-bit only athlon cpu. llvm-svn: 341021	2018-08-30 06:01:03 +00:00
Craig Topper	2b3edb902d	[X86] Remove powerpc cpu name and features from uwtables.ll llvm-svn: 341020	2018-08-30 06:01:01 +00:00
Martin Storsjo	489993db94	[MinGW] [X86] Add stubs for references to data variables that might end up imported from a dll Variables declared with the dllimport attribute are accessed via a stub variable named __imp_<var>. In MinGW configurations, variables that aren't declared with a dllimport attribute might still end up imported from another DLL with runtime pseudo relocs. For x86_64, this avoids the risk that the target is out of range for a 32 bit PC relative reference, in case the target DLL is loaded further than 4 GB from the reference. It also avoids having to make the text section writable at runtime when doing the runtime fixups, which makes it worthwhile to do for i386 as well. Add stub variables for all dso local data references where a definition of the variable isn't visible within the module, since the DLL data autoimporting might make them imported even though they are marked as dso local within LLVM. Don't do this for variables that actually are defined within the same module, since we then know for sure that it actually is dso local. Don't do this for references to functions, since there's no need for runtime pseudo relocations for autoimporting them; if a function from a different DLL is called without the appropriate dllimport attribute, the call just gets routed via a thunk instead. GCC does something similar since 4.9 (when compiling with -mcmodel=medium or large; from that version, medium is the default code model for x86_64 mingw), but only for x86_64. Differential Revision: https://reviews.llvm.org/D51288 llvm-svn: 340942	2018-08-29 17:28:34 +00:00
Simon Pilgrim	b49d5f3b53	[DAGCombiner] Add X / X -> 1 & X % X -> 0 folds Adds more divrem folds to try and get in sync with InstructionSimplify Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340919	2018-08-29 11:30:16 +00:00
Simon Pilgrim	09cc7af85a	[DAGCombiner] Add X / X -> 1 & X % X -> 0 folds (test tweaks) Adjust missed test to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes. Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340917	2018-08-29 11:23:59 +00:00
Simon Pilgrim	6d71c4cfe3	[DAGCombiner] Add X / X -> 1 & X % X -> 0 folds (test tweaks) Adjust tests to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes. Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340916	2018-08-29 11:18:14 +00:00
Simon Pilgrim	6b9bf7ecbc	[X86][AVX] Prefer VPBLENDW+VPBLENDD to VPBLENDVB for v16i16 blend shuffles Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead. TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD. Differential Revision: https://reviews.llvm.org/D50074 llvm-svn: 340913	2018-08-29 10:51:08 +00:00
Craig Topper	9f42726cc7	[X86] Support v2i32 gather/scatter indices with -x86-experimental-vector-widening-legalization Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp Reviewers: RKSimon, spatel, efriedma Reviewed By: efriedma Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D51337 llvm-svn: 340891	2018-08-29 02:12:49 +00:00
Craig Topper	9401fd0ed2	[X86] Add intrinsics for KADD instructions These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide. This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen. Differential Revision: https://reviews.llvm.org/D51370 llvm-svn: 340869	2018-08-28 19:22:55 +00:00
Craig Topper	f1c111431b	[X86] Fix copy paste mistake in vector-idiv-v2i32.ll. Add missing test case. Some of the test cases contained the same load twice instead of a different load. llvm-svn: 340833	2018-08-28 15:24:12 +00:00
Simon Pilgrim	af98587095	[X86][SSE] Improve variable scalar shift of vXi8 vectors (PR34694) This patch creates the shift mask and actual shift using the vXi16 vector shift ops. Differential Revision: https://reviews.llvm.org/D51263 llvm-svn: 340813	2018-08-28 10:37:29 +00:00
Simon Pilgrim	f119e27d80	[X86][SSE] Avoid vector extraction/insertion for non-constant uniform shifts As discussed on D51263, we're better off using byte shifts to clear the upper bits on pre-SSE41 hardware. llvm-svn: 340810	2018-08-28 10:14:09 +00:00
Sanjay Patel	fe0b5d215b	[x86] add AVX runs to show more potential scalar->vector mov opportunities; NFC llvm-svn: 340785	2018-08-27 22:29:06 +00:00
Craig Topper	171c6fe6cb	[X86] Reverse the check prefixes in the test added in r340774. The 32-bit and 64-bit checks were reversed. llvm-svn: 340775	2018-08-27 21:34:37 +00:00
Craig Topper	76b18beef1	[X86] Add test cases to show current codegen of v2i32 div/rem in 32-bit and 64-bit modes In particular this shows that we end up using libcalls in 32-bit mode even for division by constant. llvm-svn: 340774	2018-08-27 21:13:07 +00:00
Sanjay Patel	7b6df50669	[x86] add tests for possibly avoiding scalar->vector move; NFC llvm-svn: 340773	2018-08-27 20:21:33 +00:00
Craig Topper	4be11c0585	[X86] When lowering v32i8 MULHS/MULHU, shuffle after the PACKUS rather than before. We're using a 256-bit PACKUS to do the truncation, but that instruction operates on 128-bit lanes. So previously we shuffled first to rearrange the lanes. But that requires 2 shuffles. Instead we can shuffle after the PACKUS using a single VPERMQ. This matches what our normal LowerTRUNCATE code does when it uses PACKUS. Differential Revision: https://reviews.llvm.org/D51284 llvm-svn: 340757	2018-08-27 17:20:41 +00:00
Craig Topper	fff90377fd	[X86] Add support for matching paddus patterns where one of the vectors is a constant. InstCombine mucks these up a bit. So we need to do some additional pattern matching to fix it. There are a still a few special cases not handled, but this covers the general case. Differential Revision: https://reviews.llvm.org/D50952 llvm-svn: 340756	2018-08-27 17:20:38 +00:00
Aleksandr Urakov	ff88f1763b	[X86] Adding the test pointing to the fail case of D45653 Summary: This commit adds the case of tail calling a sret function from a non-sret function when both functions have the C calling convention. llvm-svn: 340737	2018-08-27 11:56:32 +00:00
Aleksandr Urakov	6f7fef7865	[NFC][X86] Fix `sibcall.ll` formatting Summary: Remove unnecessary lines from `sibcall.ll` and rename labels according to @RKSimon's recommendations in the D45653 conversation. llvm-svn: 340735	2018-08-27 11:25:38 +00:00
Craig Topper	128915f4ae	[X86] Add FeatureCMOV explicitly to all CPUs that support it. Remove FeatureCMOV implication from Feature64Bit and FeatureSSE1 Summary: Previously most CPUs inherited cmov support through Feature64Bit(or FeatureCMPXCHG16HB implying Feature64Bit) or FeatureSSE1. This has the surprising side effect that -mattr=-cmov causes an assert to fire in 64-bit mode because it clears the Feature64Bit. Or in 32-bit mode, -mattr=-cmov disables any sse/avx features which seems surprising. This patch removes the implication and instead updates hasCMOV in X86Subtarget to check SSE1 or is64Bit in addition to the regular cmov flag. This should keep most things working the way they did before. I don't believe there is a way to specific "-cmov" directly from clang so this should only effect our lower level tools. This does stop -mattr=cx16(cmpxchg16b) from implying cmov is enabled via the 64bit flag as you can see from one of the changed tests. But that was a 32-bit test so I don't know why it enabled cx16 anyway. For the other test I had to add -sse to override the new sse check in hasCMOV. Reviewers: RKSimon, DavidKreitzer, spatel Reviewed By: RKSimon Subscribers: llvm-commits, jfb Differential Revision: https://reviews.llvm.org/D51228 llvm-svn: 340707	2018-08-26 18:29:33 +00:00
Craig Topper	b68a78b9ac	[X86] Add FeatureCMOV to athlon and athlon-tbird cpus. Summary: This matches gcc and one cpuid dump I found online. Given that these are considered 7th generation x86 CPU it seems likely they support cmov since cmov was added by Intel in their 6th generation. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51264 llvm-svn: 340706	2018-08-26 18:29:27 +00:00
Sanjay Patel	113cac3b15	[SelectionDAG][x86] turn insertelement into undef with variable index into splat I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705	2018-08-26 18:20:41 +00:00
Craig Topper	7ef643ef17	[X86] Add test cases for D50952, paddus patterns involving constants. NFC llvm-svn: 340694	2018-08-26 00:22:07 +00:00
Craig Topper	ebec2793d1	[X86] Replace support for vXi32 SMUL_LOHI/UMUL_LOHI with MULHS/MULHU support instead. Summary: The only time vector SMUL_LOHI/UMUL_LOHI nodes are created is during division/remainder lowering. If its created before op legalization, generic DAGCombine immediately turns that SMUL_LOHI/UMUL_LOHI into a MULHS/MULHU since only the upper half is used. That node will stick around through vector op legalization and will be turned back into UMUL_LOHI/SMUL_LOHI during op legalization. It will then be custom lowered by the X86 backend. Due to this two step lowering the vector shuffles created by the custom lowering get legalized after their inputs rather than before. This prevents the shuffles from being combined with any build_vector of constants. This patch uses changes vXi32 to use MULHS/MULHU instead. This is what the later DAG combine did anyway. But by skipping the change back to UMUL_LOHI/SMUL_LOHI we lower it before any constant BUILD_VECTORS. This allows the vector_shuffle creation to constant fold with the build_vectors. This accounts for the test changes here. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51254 llvm-svn: 340690	2018-08-25 18:01:24 +00:00
Sanjay Patel	8a84c747d2	[x86] try harder to use broadcast to load a scalar into vector reg This is a preliminary step for a preliminary step for D50992. I noticed that x86 often misses chances to load a scalar directly into a vector register. So this patch is just allowing more of those cases to match a broadcast op in lowerBuildVectorAsBroadcast(). The old code comment said it doesn't make sense to use a broadcast when we're loading a single element and everything else is undef, but I think that's the best case in the improved tests in insert-loaded-scalar.ll. We avoid scalar-to-vector-register move and/or less efficient shuffling. Note that there are some existing types that were already producing a broadcast, but that happens semi-accidentally. Ie, it's not happening as part of lowerBuildVectorAsBroadcast(). The build vector gets expanded into load + shuffle, and then shuffle lowering produces the broadcast. Description of the other test diffs: 1. avx-basic.ll - replacing load+shufle is a win. 2. sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral 3. sse41.ll - don't care - we convert that intrinsic to generic IR now, so this test is deprecated 4. vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - pshufb alternatives with an extra instruction are not obviously bad Differential Revision: https://reviews.llvm.org/D51125 llvm-svn: 340685	2018-08-25 14:56:05 +00:00
Simon Pilgrim	eb6a3cbb28	[X86] Make requested test changes from D50636 The tests were relying on X / X -> 1 and X % X -> 0 combines not happening in the DAG. llvm-svn: 340682	2018-08-25 14:16:03 +00:00
Bjorn Pettersson	8483004723	[LiveDebugVariables] Avoid faulty addDefsFromCopies in computeIntervals Summary: When computeIntervals is looking through COPY instruction to extend the location mapping for a debug variable it did not handle subregisters correctly. For example DBG_VALUE debug-use %0.sub_8bit_hi, ... %1:gr16 = COPY %0 was transformed into DBG_VALUE debug-use %0.sub_8bit_hi, ... %1:gr16 = COPY %0 DBG_VALUE debug-use %1, ... So the subregister index was missing in the added DBG_VALUE. As long as the subreg refered to the least significant bits of the superreg, then I guess we could get the correct result in a debugger even when referring to the superreg. But as in the example above when the subreg refers to other parts of the superreg, then debuginfo would be incorrect. I'm not sure exactly how to fix this properly, so this patch just avoids looking through the COPY when there is a subreg involved (for more info, see the FIXME added in the code). Reviewers: rnk, aprantl Reviewed By: aprantl Subscribers: JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D50788 llvm-svn: 340679	2018-08-25 10:02:03 +00:00
Peter Collingbourne	3f792230cb	CodeGen: Add two more conditions for adding symbols to the address-significance table. Firstly, require the symbol to be used within the module. If a symbol is unused within a module, then by definition it cannot be address-significant within that module. This condition is useful on all platforms because it could make symbol tables smaller -- without this change, emitting an address-significance table could cause otherwise unused undefined symbols to be added to the object file. But this change is necessary with COFF specifically in order to preserve the property that an unreferenced undefined symbol in an IR module does not result in a link failure. This is already the case for ELF because ELF linkers only reject links with unresolved symbols if there is a relocation to that symbol, but COFF linkers require all undefined symbols to be resolved regardless of relocations. So if a module contains an unreferenced undefined symbol, we need to make sure not to add it to the address-significance table (and thus the symbol table) in case it doesn't end up resolved at link time. Secondly, do not add dllimport symbols to the table. These symbols won't be able to be resolved because their definitions live in another module and are accessed via the IAT, and the address-significance table has no effect on other modules anyway. It wouldn't make sense to add the IAT entry symbol to the address-significance table either because the IAT entry isn't address-significant -- the generated code never takes its address. Differential Revision: https://reviews.llvm.org/D51199 llvm-svn: 340648	2018-08-24 20:37:09 +00:00
Stefan Pintilie	892fc6b7f2	[Exception Handling] Unwind tables are required for all functions that have an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 llvm-svn: 340641	2018-08-24 19:38:29 +00:00
Craig Topper	4058e29e7d	[X86] Teach combineLoopMAddPattern to handle cases where there is no loop and the add has two multiply inputs Differential Revision: https://reviews.llvm.org/D50868 llvm-svn: 340631	2018-08-24 18:05:04 +00:00
Craig Topper	3c78622d64	[X86] Add test case for D50868. NFC llvm-svn: 340630	2018-08-24 18:05:02 +00:00
Stefan Pintilie	7cb44f2470	Revert "[Exception Handling] Unwind tables are required for all functions that have an EH personality." This reverts commit rL340614. Previous commit broke some llvm-cfi-verify tests. llvm-svn: 340625	2018-08-24 17:27:35 +00:00
Stefan Pintilie	36f31617d3	[Exception Handling] Unwind tables are required for all functions that have an EH personality. This patch is for defect: https://bugs.llvm.org/show_bug.cgi?id=32611 Functions may require unwind tables even if they are marked with the attribute nounwind. Any function with an EH personality may require an unwind table. Differential Revision: https://reviews.llvm.org/D50987 llvm-svn: 340614	2018-08-24 15:51:47 +00:00
Sanjay Patel	851e02e52e	[x86] move/add tests for insertelement with variable index; NFC The variable index pattern is different than the constant index cases as shown in D51125. We might want to splat regardless of whether the scalar is loaded from memory or transferred from GPR. llvm-svn: 340565	2018-08-23 18:38:40 +00:00
Chandler Carruth	ae0cafece8	[x86/retpoline] Split the LLVM concept of retpolines into separate subtarget features for indirect calls and indirect branches. This is in preparation for enabling only the call retpolines when using speculative load hardening. I've continued to use subtarget features for now as they continue to seem the best fit given the lack of other retpoline like constructs so far. The LLVM side is pretty simple. I'd like to eventually get rid of the old feature, but not sure what backwards compatibility issues that will cause. This does remove the "implies" from requesting an external thunk. This always seemed somewhat questionable and is now clearly not desirable -- you specify a thunk the same way no matter which set of things are getting retpolines. I really want to keep this nicely isolated from end users and just an LLVM implementation detail, so I've moved the `-mretpoline` flag in Clang to no longer rely on a specific subtarget feature by that name and instead to be directly handled. In some ways this is simpler, but in order to preserve existing behavior I've had to add some fallback code so that users who relied on merely passing -mretpoline-external-thunk continue to get the same behavior. We should eventually remove this I suspect (we have never tested that it works!) but I've not done that in this patch. Differential Revision: https://reviews.llvm.org/D51150 llvm-svn: 340515	2018-08-23 06:06:38 +00:00
Craig Topper	cf9df99d79	[X86] Teach combineLoopSADPattern to handle cases where there is no loop and the add has two absolute difference inputs Previously we asumed a vector reduction add is part of a loop and one of the input is a phi. But the code in SelectionDAGBuilder that sets vector reduction flag handles more cases than that. It just requires that the use chain ends in a horizontal reduction. And there are no other uses. This means it can handle unrolled reduction loops. If the initial value of the reduction was 0, an unrolled loop would begin with a vector reduction add that has two sad inputs. Previously we would only transform one side of the add, but for this case we need to transform both sides. I've created a lambda to reuse some of the code for both sides. And fixed the variables names to remove reference to "phi". Differential Revision: https://reviews.llvm.org/D50817 llvm-svn: 340478	2018-08-22 23:19:01 +00:00
Craig Topper	903ef6a03f	[X86] Add test cases for D50817. NFC llvm-svn: 340477	2018-08-22 23:18:58 +00:00
Sanjay Patel	ed1b9695ee	[SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527) This solves the motivating case from: https://bugs.llvm.org/show_bug.cgi?id=38527 If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge that some of the scalar elements are undef. Differential Revision: https://reviews.llvm.org/D50791 llvm-svn: 340469	2018-08-22 22:52:05 +00:00
Craig Topper	538f8ab438	[X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can. This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct. To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order. Differential Revision: https://reviews.llvm.org/D48789 llvm-svn: 340441	2018-08-22 19:39:09 +00:00
Sanjay Patel	b5686c4e4e	[x86] add tests for load scalar + insertelement; NFC llvm-svn: 340425	2018-08-22 17:46:28 +00:00
Simon Pilgrim	ffdfe45645	[X86][SSE] LowerMULH vXi8 - use SSE shifts directly. We know these vXi16 extended cases are legal constant splat shifts. llvm-svn: 340414	2018-08-22 15:37:11 +00:00
Simon Pilgrim	b89a4f85bf	[X86][SSE] Add sdiv test case from PR38658 llvm-svn: 340393	2018-08-22 09:47:12 +00:00
Bjorn Pettersson	e06321382b	[RegisterCoalescer] Use substPhysReg in reMaterializeTrivialDef Summary: When RegisterCoalescer::reMaterializeTrivialDef is substituting a register use in a DBG_VALUE instruction, and the old register is a subreg, and the new register is a physical register, then we need to use substPhysReg in order to extract the correct subreg. Reviewers: wmi, aprantl Reviewed By: wmi Subscribers: hiraditya, MatzeB, qcolombet, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D50844 llvm-svn: 340326	2018-08-21 19:47:32 +00:00
Simon Pilgrim	9848e0c9ac	[X86][SSE] Add non-uniform udiv test that is mostly divide by 1. The test demonstrates over-complicated codegen for a udiv that only has one divisor that doesn't equal 1. This should have allowed the codegen to be a lot simpler (uniform shifts etc.) but only the SSE2 manages to make use of this...... llvm-svn: 340313	2018-08-21 18:02:28 +00:00
Craig Topper	b172b8884a	[BypassSlowDivision] Teach bypass slow division not to interfere with div by constant where constants have been constant hoisted, but not moved from their basic block DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block. Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all. This fixes the case from PR38649. Differential Revision: https://reviews.llvm.org/D51000 llvm-svn: 340303	2018-08-21 17:15:33 +00:00
Simon Pilgrim	43cf2c20ab	[X86] Add SSE2 and XOP udiv combine tests llvm-svn: 340282	2018-08-21 15:21:45 +00:00
Simon Pilgrim	8e15b43092	[X86] Add SSE2 sdiv combine tests llvm-svn: 340264	2018-08-21 10:44:06 +00:00
Sam Parker	597811e7a7	[DAGCombiner] Reduce load widths of shifted masks During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261	2018-08-21 10:26:59 +00:00
Simon Pilgrim	72b324de4d	[TargetLowering] Add BuildSDiv support for division by one or negone. This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1. llvm-svn: 340260	2018-08-21 10:20:36 +00:00
Bjorn Pettersson	880f291577	[RegisterCoalescer] Do not assert when trying to remat dead values Summary: RegisterCoalescer::reMaterializeTrivialDef used to assert that the input register was live in. But as shown by the new coalesce-dead-lanes.mir test case that seems to be a valid scenario. We now return false instead of the assert, simply avoiding to remat the dead def. Normally a COPY of an undef value is eliminated by eliminateUndefCopy(). Although we only do that when the destination isn't a physical register. So the situation above should be limited to the case when we copy an undef value to a physical register. Reviewers: kparzysz, wmi, tpr Reviewed By: kparzysz Subscribers: MatzeB, qcolombet, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D50842 llvm-svn: 340255	2018-08-21 07:49:05 +00:00
Craig Topper	9c57ba0dc3	[X86] Add test command line to expose PR38649. Bypass slow division and constant hoisting are conspiring to break div+rem of large constants. llvm-svn: 340217	2018-08-20 21:51:35 +00:00
Craig Topper	210ccfe3db	[X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG. This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle. Fixes PR38639 Differential Revision: https://reviews.llvm.org/D50981 llvm-svn: 340214	2018-08-20 21:08:35 +00:00
Craig Topper	7dcb2c4b0a	[X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector. Differential Revision: https://reviews.llvm.org/D50878 llvm-svn: 340212	2018-08-20 20:57:35 +00:00
Craig Topper	08e7e04998	[X86] Pre-commit test cases for D50878. llvm-svn: 340211	2018-08-20 20:57:32 +00:00
Cameron McInally	94b9029be9	[FPEnv] Support constrained FREM intrinsic Differential Revision: https://reviews.llvm.org/D50975 llvm-svn: 340201	2018-08-20 19:28:56 +00:00
Simon Pilgrim	6ac905926f	[TargetLowering] Disable BuildSDiv division by one or negone. Fuzz tests have detected an issue, currently working on a fix. llvm-svn: 340195	2018-08-20 18:23:54 +00:00
Simon Pilgrim	5b78c9d58d	[SelectionDAG] Add partial sign-bit support to ComputeNumSignBits for BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. Handle the case where the sign bit extends to only part of the small elements. llvm-svn: 340169	2018-08-20 13:05:48 +00:00
Simon Pilgrim	11bec5b80c	[X86][SSE] Fix PACKSS bitcast test from rL340166 We need the signbits to extends to lower 16-bits of the even elements llvm-svn: 340167	2018-08-20 11:47:15 +00:00
Simon Pilgrim	cee9c64838	[X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle a partial sign bits extension through a bitcast llvm-svn: 340166	2018-08-20 11:10:12 +00:00
Simon Pilgrim	686090a45f	[X86] Drop unnecessary exact qualifier from packss test llvm-svn: 340165	2018-08-20 11:01:51 +00:00
Simon Pilgrim	5b936ec89e	[SelectionDAG] Add basic demanded elements support to ComputeNumSignBits for BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements. llvm-svn: 340143	2018-08-19 17:47:50 +00:00
Simon Pilgrim	0fd72ab44f	[X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle demanded elts through a bitcast llvm-svn: 340139	2018-08-19 16:01:47 +00:00
Craig Topper	803912ea57	[X86] Fix an issue in the matching for ADDUS. We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add. This rewrites the canonicalization to just be based on the condition code. llvm-svn: 340134	2018-08-19 04:26:31 +00:00
Craig Topper	a85d7e927b	[X86] Add a test case showing an issue in our addusw pattern matching. We are unable to handle a normal add followed by a saturing add with certain operand orders on the icmp. llvm-svn: 340133	2018-08-19 04:26:29 +00:00
Craig Topper	40c9559b74	[X86] Add support for using 512-bit PSUBUS to combineSelect. The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements. While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular. llvm-svn: 340128	2018-08-18 18:51:03 +00:00
Craig Topper	b40a1d5f84	[X86] Add test cases to show missed opportunities to use 512-bit PSUBUS. llvm-svn: 340127	2018-08-18 18:50:59 +00:00

... 3 4 5 6 7 ...

12775 Commits