llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	3d671248ab	[SelectionDAG] Provide default expansion for rotates Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497	2018-06-12 12:49:36 +00:00
Aleksandar Beserminji	8acdc10220	[mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with additional flag, so instead of always lowering to lui %hi(...), addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi, %higher or %highest depending on the added flag. Differential Revision: https://reviews.llvm.org/D47941 llvm-svn: 334490	2018-06-12 10:23:49 +00:00
Michael Berg	95f3a430a8	NFC, some additional tests added and some renaming for planned fma support changes llvm-svn: 334461	2018-06-12 00:52:43 +00:00
Craig Topper	957b738432	[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460	2018-06-12 00:48:57 +00:00
Mark Searles	987f292c56	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459	2018-06-12 00:41:26 +00:00
Matt Arsenault	5615fa0a87	DAG: Fix extract_subvector combine for a single element This would fail before because 1x vectors aren't legal, so instead just use the scalar type. Avoids regressions in a future AMDGPU commit to add v4i16/v4f16 as legal types. Test update is just the one test that this triggers on in tree now. It wasn't checking anything before. The result is completely changed since the selects are eliminated. Not sure if it's considered better or not. llvm-svn: 334440	2018-06-11 21:27:41 +00:00
Krzysztof Parzyszek	dd9415d550	[Hexagon] Late predicate producers cannot be used as dot-new sources llvm-svn: 334426	2018-06-11 18:45:52 +00:00
Stanislav Mekhanoshin	7ba3fc730c	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420	2018-06-11 16:50:49 +00:00
Aleksandar Beserminji	62cf9d21ab	[mips] Fix spill slot for mips3, n64 abi When program is compiled for mips3 with n64 abi, wrong register class is used for creating an emergency spill slot. This patch fixes the correct register class to be chosen. This patch resolves PR35859. Thanks to John Baldwin for reporting the issue! Differential Revision: https://reviews.llvm.org/D47938 llvm-svn: 334419	2018-06-11 16:50:28 +00:00
Dylan McKay	d011869c82	[AVR] Set trackLivenessAfterRegAlloc This sets trackLivenessAfterRegAlloc on AVRRegisterInfo. Most existing targets set this flag. Without it, specific IR inputs cause LLVM to fail with: Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) && "Liveness information is accurate"), function livein_begin file MachineBasicBlock.cpp, line 1354. With this commit, this no longer happens. Patch by Peter Nimmervoll. llvm-svn: 334409	2018-06-11 14:46:48 +00:00
Clement Courbet	7db69cc08a	[X86] Fix skylake server scheduling info. Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407	2018-06-11 14:37:53 +00:00
Sanjay Patel	a1791be455	[x86] add scalar cvtt intrinsic tests; NFC More coverage for the problem noted in D47993 (although these shouldn't be affected by that patch). llvm-svn: 334404	2018-06-11 13:51:34 +00:00
Roman Lebedev	b896c4e860	[NFC][AMDGPU] Add tests for all the various IR patterns equivalent to extracting low bits. Summary: The idiom recognition seems rather poor. Only the `@bzhi32_d0` produces `v_bfe_u32`. But they all should. This needs to be fixed before D47980 can be re-landed. Reviewers: mareko, bogner, rampitec, arsenm, tstellar, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48005 llvm-svn: 334398	2018-06-11 10:21:10 +00:00
Craig Topper	c12ac0c984	[X86] Add test files for upgrade of vbmi2 expand load and compress store intrinsics that was done in r334381. llvm-svn: 334386	2018-06-11 06:20:24 +00:00
Craig Topper	0e25c8239a	[X86] Remove masking from dbpsadbw intrinsics, use select in IR instead. llvm-svn: 334384	2018-06-11 06:18:22 +00:00
Daniel Cederman	33f67a256b	[Sparc] Add support for 13-bit PIC Summary: When compiling with -fpic, in contrast to -fPIC, use only the immediate field to index into the GOT. This saves space if the GOT is known to be small. The linker will warn if the GOT is too large for this method. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47136 llvm-svn: 334383	2018-06-11 05:50:08 +00:00
Craig Topper	e71ad1f6d0	[X86] Remove and autoupgrade the expandload and compressstore intrinsics. We use the target independent intrinsics now. llvm-svn: 334381	2018-06-11 01:25:22 +00:00
Sanjay Patel	3e5c70cc1d	[DAGCombiner] match vector compare and select sizes with extload operand (PR37427) This patch started off much more general and ambitious, but it's been a nightmare seeing all the ways x86 vector codegen can go wrong. So the code is still structured to allow extending easily, but it's currently limited in several ways: 1. Only handle cases with an extending load. 2. Only handle cases with a zero constant compare. 3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected. The motivating case from PR37427: https://bugs.llvm.org/show_bug.cgi?id=37427 ...is the 1st test, and that shows the expected win - we eliminated the unnecessary intermediate cast. There's a clear regression in the last test (sgt_zero_fp_select) because we longer recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit setcc from a sign-extended operand and remove it. Differential Revision: https://reviews.llvm.org/D47330 llvm-svn: 334378	2018-06-10 23:09:50 +00:00
Craig Topper	304bd747af	[X86] Add expandload and compresstore fast-isel tests for avx512f and avx512vl. Update existing tests for avx512vbmi2 to use target independent intrinsics. llvm-svn: 334368	2018-06-10 18:55:37 +00:00
Sanjay Patel	15bee8f1c0	[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367	2018-06-10 17:42:12 +00:00
Ivan A. Kosarev	847daa11f8	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361	2018-06-10 09:27:27 +00:00
Craig Topper	301d526329	[X86] Fix forward declaration in a test case that was messed up in r334358 llvm-svn: 334360	2018-06-10 06:43:48 +00:00
Craig Topper	98a79934af	[X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead. llvm-svn: 334358	2018-06-10 06:01:36 +00:00
Eli Friedman	864df22307	[ARM] Allow CMPZ transforms even if the input has multiple uses. It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322	2018-06-08 21:16:56 +00:00
Simon Pilgrim	5c32989c91	[X86][SSE] Support v8i16/v16i16 rotations Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together. Differential Revision: https://reviews.llvm.org/D47822 llvm-svn: 334309	2018-06-08 17:58:42 +00:00
Sanjay Patel	70314bd61c	[x86] add tests for node-level FMF; NFC These cases should be optimized using the change from D47911. llvm-svn: 334308	2018-06-08 17:54:28 +00:00
Sanjay Patel	9995a00a94	[x86] regenerate test checks; NFC llvm-svn: 334307	2018-06-08 17:42:35 +00:00
Michael Berg	bf90d1f263	Utilize new SDNode flag functionality to expand current support for fsub Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D47910 llvm-svn: 334306	2018-06-08 17:39:50 +00:00
Simon Pilgrim	89deac6694	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303	2018-06-08 17:00:45 +00:00
Simon Pilgrim	59e915c691	[X86] Fix schedule-x86_64.s tests to use different registers in reg-reg cases Same fix as rL334110: I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction schedule tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2. llvm-svn: 334302	2018-06-08 16:40:15 +00:00
Daniil Fukalov	c9a098b314	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301	2018-06-08 16:29:04 +00:00
Simon Pilgrim	eab9d20424	[X86][SSE] Add SSE2/AVX2 vector rotate tests Now that we're custom lowering vector rotates for SSE in general we should be testing the combines with them as well. llvm-svn: 334290	2018-06-08 14:07:21 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00
Sanjay Patel	ab4ca0603c	[x86] restore test comment; NFC The description got deleted along with the FIXME note in rL334268. llvm-svn: 334288	2018-06-08 13:53:13 +00:00
Simon Pilgrim	ad45efc445	[X86][SSE] Consistently prefer lowering to PACKUS over PACKSS We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276	2018-06-08 10:29:00 +00:00
Matt Arsenault	6fc3759811	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269	2018-06-08 08:05:54 +00:00
Sam Parker	16f963ba0d	[DAGCombine] Fix for PR37667 While trying to propagate AND masks back to loads, we currently allow one non-load node to be included as a leaf in chain. This fix now limits that node to produce only a single data value. Differential Revision: https://reviews.llvm.org/D47878 llvm-svn: 334268	2018-06-08 07:49:04 +00:00
Michael Berg	77b5be7ec6	propagate fast math flags via IR on fma and sub expressions Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode. Reviewers: spatel, arsenm, hfinkel, javed.absar Reviewed By: spatel Subscribers: nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47388 llvm-svn: 334242	2018-06-07 22:49:09 +00:00
Tony Tye	a5a7c331e7	[AMDGPU] Simplify memory legalizer - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334241	2018-06-07 22:28:32 +00:00
Sanjay Patel	898fbd7c47	[x86] add tests for backwards propagate mask bug (PR37060, PR37667); NFC llvm-svn: 334199	2018-06-07 14:11:18 +00:00
Hiroshi Inoue	01ef4c2c64	[PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation. However, enforcing 32-bit instruction sometimes results in redundant generated code. For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF). uint64_t func(uint64_t a, uint64_t b) { return (a & 0xFFFFFFFF) \| (b << 32) ; } To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group. Differential Revision: https://reviews.llvm.org/D47867 llvm-svn: 334195	2018-06-07 13:21:14 +00:00
Simon Pilgrim	09953d8412	[X86][SSE] Simplify combineVectorTruncationWithPACKSS to reduce code duplication Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code. llvm-svn: 334193	2018-06-07 13:01:42 +00:00
Matt Arsenault	f1c868ef08	AMDGPU: Fix not including v2f64 in SReg_128 Fixes assertion with calls returning v2f64. llvm-svn: 334189	2018-06-07 12:16:31 +00:00
Simon Pilgrim	0e29d8d81f	[X86][SSE] Add extra trunc(shl) test cases The existing trunc_shl_17_v8i16_v8i32 test case should (but doesn't) fold to zero, I've added 2 new test cases: - trunc_shl_16_v8i16_v8i32 which folds to zero (this is actually testing the target faux shuffle combine) - trunc_shl_15_v8i16_v8i32 which should perform the full shl + truncate llvm-svn: 334188	2018-06-07 11:22:52 +00:00
Matt Arsenault	697300bd4f	AMDGPU: Use scalar operations for f16 fabs/fneg patterns Fixes unnecessary differences between subtargets. llvm-svn: 334184	2018-06-07 10:15:20 +00:00
Simon Pilgrim	cc92897be9	[X86] Regenerate rotate tests Add 32-bit tests to show missed SHLD/SHRD cases llvm-svn: 334183	2018-06-07 10:13:09 +00:00
Matt Arsenault	90083d3088	AMDGPU: Try a lot harder to emit scalar loads This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180	2018-06-07 09:54:49 +00:00
Tomasz Krupa	f8c7637027	[X86] Block UndefRegUpdate Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update. Reviewers: craig.topper Subscribers: llvm-commits, mike.dvoretsky, ashlykov Differential Revision: https://reviews.llvm.org/D47621 llvm-svn: 334175	2018-06-07 08:48:45 +00:00
Karl-Johan Karlsson	abb11f805f	[BranchFolding] Fix live-in's when hoisting code Summary: When the branch folder hoist code into a predecessor it adjust live-in's in the blocks it hoist code from. However it fail to handle hoisted code that contain a defed register that originally is live-in in the block through a super register. This is fixed by replacing the live-in handling code with calls to utility functions in LivePhysRegs. Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47529 llvm-svn: 334163	2018-06-07 07:20:33 +00:00
Stanislav Mekhanoshin	df61be70b2	[AMDGPU] Improve reciprocal handling When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x = s; return s v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142	2018-06-06 22:22:32 +00:00

1 2 3 4 5 ...

24758 Commits