llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	95eb88abfe	[X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg input. Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)). This patch fixes that so we can also combine things like (fmsub (fneg)). llvm-svn: 336304	2018-07-05 02:52:56 +00:00
Craig Topper	e4b9257b69	[X86] Remove some of the packed FMA3 intrinsics since we no longer use them in clang. There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes. More removals to come, but I wanted to stop and fix the regression that showed up in this first. llvm-svn: 336303	2018-07-05 02:52:54 +00:00
Simon Pilgrim	0c3b421b1a	[X86][SSE] Add v16i16 shl x,c -> pmullw test llvm-svn: 336277	2018-07-04 14:20:58 +00:00
Simon Pilgrim	6a99ce27f6	[X86][SSE] Add SSE2 target to some shift tests Show the difference in behaviour cf SSE41 (no PMULLD, PBLENDW etc.) Raised by D48936 llvm-svn: 336271	2018-07-04 13:58:13 +00:00
Simon Pilgrim	c3e1617bf9	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED) We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247. llvm-svn: 336250	2018-07-04 09:12:48 +00:00
Simon Pilgrim	61fdf3b33c	[X86][SSE] Add reduced crash test case for r336113 - [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values The patch was reverted at r336189 due to crashes llvm-svn: 336247	2018-07-04 08:55:23 +00:00
Max Kazantsev	e8e01143ec	[ImplicitNullChecks] Check for rewrite of register used in 'test' instruction The following code pattern: mov %rax, %rcx test %rax, %rax %rax = .... je throw_npe mov(%rcx), %r9 mov(%rax), %r10 gets transformed into the following incorrect code after implicit null check pass: mov %rax, %rcx %rax = .... faulting_load_op("movl (%rax), %r10", throw_npe) mov(%rcx), %r9 For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization. Patch by Surya Kumari Jangala! Differential Revision: https://reviews.llvm.org/D48627 Reviewed By: skatkov llvm-svn: 336241	2018-07-04 08:01:26 +00:00
Roman Lebedev	93357f52c6	[X86] Add tests for low/high bit clearing with different attributes. D48768 may turn some of these into shifts. Reviewers: spatel Reviewed By: spatel Subscribers: spatel, RKSimon, llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D48767 llvm-svn: 336224	2018-07-03 19:12:37 +00:00
Craig Topper	bc598f0d61	[X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode. This might make the error message added in r335668 unneeded, but I'm not sure yet. The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit. llvm-svn: 336217	2018-07-03 17:40:51 +00:00
Simon Pilgrim	74cc4cfa94	[DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code llvm-svn: 336198	2018-07-03 14:11:32 +00:00
Benjamin Kramer	fd171f2f89	Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values" This reverts commit r336113. It causes crashes. llvm-svn: 336189	2018-07-03 11:15:17 +00:00
Craig Topper	6121699b11	[X86] Add avx512vl command line to break-false-dep.ll llvm-svn: 336169	2018-07-03 04:43:49 +00:00
Krzysztof Parzyszek	fd97494984	[X86] Add phony registers for high halves of regs with low halves Add registers still missing after r328016 (D43353): - for bits 15-8 of SI, DI, BP, SP (H), and R8-R15 (BH), - for bits 31-16 of R8-R15 (*WH). Thanks to Craig Topper for pointing it out. llvm-svn: 336134	2018-07-02 19:05:09 +00:00
Craig Topper	56440b9745	[X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned. Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions. Should fix PR38001 llvm-svn: 336121	2018-07-02 17:01:54 +00:00
Simon Pilgrim	2bc8e079f2	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113	2018-07-02 15:14:07 +00:00
Simon Pilgrim	a6be2437e7	[X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic blend We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends. llvm-svn: 336112	2018-07-02 14:53:41 +00:00
Craig Topper	df99cdb95b	[X86] Fix a few test names in avx512-intrinsics-fast-isel.ll to match their clang intrinsic names. I thought I fixed these yesterday, but I guess I missed a few. llvm-svn: 336071	2018-07-01 23:49:06 +00:00
Simon Pilgrim	fae337704e	[DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119) The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks. Thanks to @zvi for the original patch. Differential Revision: https://reviews.llvm.org/D45806 llvm-svn: 336048	2018-06-30 12:22:55 +00:00
Craig Topper	50a10ba6e0	[X86] Update some avx512 fast-isel tests to match their real clang IRgen. Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value. Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed. Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend. llvm-svn: 336045	2018-06-30 07:25:29 +00:00
Craig Topper	db1d7f2b16	[X86] Change some chec-prefixes from X32 to X86 to match the FileCheck command line. I think this test changed and these test cases were created around the same time and missed the change. llvm-svn: 336044	2018-06-30 06:45:10 +00:00
Craig Topper	8f6ace5bcd	[X86] Remove test cases from avx512vl-intrinsics-fast-isel.ll for intrinsics that don't really exist in clang. llvm-svn: 336043	2018-06-30 06:45:09 +00:00
Craig Topper	59f2f38fe0	[X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead. llvm-svn: 336035	2018-06-30 01:32:04 +00:00
Craig Topper	87b107dd69	[X86] Limit the number of target specific nodes emitted in LowerShiftParts The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes. The changed test cases still aren't optimal. Differential Revision: https://reviews.llvm.org/D48619 llvm-svn: 335998	2018-06-29 17:24:07 +00:00
Simon Pilgrim	aab8660e23	[X86][SSE] Support v16i8/v32i8 vector rotations This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957	2018-06-29 09:36:39 +00:00
Craig Topper	875e9f8fa4	[X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944	2018-06-29 05:43:26 +00:00
Martin Storsjo	2a9bd7b756	[COFF] Fix constant sharing regression for MinGW This fixes a regression since SVN r334523, where the object files built targeting MinGW were rejected by GNU binutils tools. Prior to that commit, we only put constants in comdat for MSVC configurations. Differential Revision: https://reviews.llvm.org/D48567 llvm-svn: 335918	2018-06-28 20:28:29 +00:00
Craig Topper	90317d1d94	[X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895	2018-06-28 17:58:01 +00:00
Jonas Devlieghere	b757fc3878	Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894	2018-06-28 17:56:43 +00:00
Simon Pilgrim	9c70d48cb2	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED) We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884. llvm-svn: 335886	2018-06-28 17:33:41 +00:00
Matthias Braun	da5e7e11d1	SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O) Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877	2018-06-28 17:00:45 +00:00
Haojian Wu	2103990e63	Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV" This reverts commit r335821. This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce. llvm-svn: 335871	2018-06-28 16:25:57 +00:00
Simon Pilgrim	abebe4c746	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821	2018-06-28 09:54:28 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Craig Topper	812fcb35e7	[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754	2018-06-27 16:47:39 +00:00
Craig Topper	069628b4df	[X86] Add test cases for D48606. llvm-svn: 335753	2018-06-27 16:47:36 +00:00
Simon Pilgrim	8a02b25313	[X86][SSE] Add missing AVX512 rotation tests Increase coverage to make sure we're not doing anything stupid without AVX512BW llvm-svn: 335746	2018-06-27 16:00:53 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Simon Pilgrim	d3e583a52d	[DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs llvm-svn: 335727	2018-06-27 12:45:31 +00:00
Simon Pilgrim	41afbcb9ca	[X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 tests llvm-svn: 335721	2018-06-27 10:59:36 +00:00
Simon Pilgrim	dfbcc66adc	[DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0) Fixes PR37569. llvm-svn: 335719	2018-06-27 10:21:06 +00:00
Simon Pilgrim	0a566bc0ae	[DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569) llvm-svn: 335717	2018-06-27 09:41:22 +00:00
Simon Pilgrim	c9e60adcb5	[X86] Add test for SDIV by sign bit (minsigned) value llvm-svn: 335671	2018-06-26 22:03:00 +00:00
Jessica Paquette	67599c2e1e	[X86][AsmParser] Recommit r335658 Recommit of r335658 so that it does not change the behaviour of any existing error output. llvm-svn: 335668	2018-06-26 21:30:34 +00:00
Jessica Paquette	0a80af0761	Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode" This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b. It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly. llvm-svn: 335660	2018-06-26 20:57:19 +00:00
Jessica Paquette	0e40d4bfc3	[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode Right now, when we use RIP-relative instructions in 32-bit mode, we'll just assert and crash. This adds an error message which tells the user that they can't do that in 32-bit mode, so that we don't crash (and also can see the issue outside of assert builds). llvm-svn: 335658	2018-06-26 20:33:46 +00:00
Simon Pilgrim	7f55af37f4	[DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119) Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported. llvm-svn: 335637	2018-06-26 17:46:51 +00:00
Simon Pilgrim	1576df53a9	[X86][SSE] Add another sdiv by (nonuniform) minus one test (PR37119) Include a test that divides by -1 but not by 1 (another special case) llvm-svn: 335629	2018-06-26 17:06:05 +00:00
Than McIntosh	3190993a02	[X86,ARM] Retain split-stack prolog check for sibling calls Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604	2018-06-26 14:11:30 +00:00
Craig Topper	c42ed4e3c4	[X86] Use XOR for SUB (C, X) during isel if will help fold an immediate Summary: Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining. This seems like the safest way to do this trick. And we consider doing it as a DAG combine later. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48557 llvm-svn: 335575	2018-06-26 03:11:15 +00:00
Craig Topper	689e363ff2	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. This recommits r335562 and 335563 as a single commit. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335568	2018-06-26 01:37:02 +00:00

1 2 3 4 5 ...

12061 Commits