llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	bea23d065e	[Hexagon] Make floating point operations expensive for vectorization llvm-svn: 334508	2018-06-12 15:12:50 +00:00
Simon Pilgrim	0783921987	[CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506	2018-06-12 14:47:13 +00:00
Krzysztof Parzyszek	3d671248ab	[SelectionDAG] Provide default expansion for rotates Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497	2018-06-12 12:49:36 +00:00
Simon Dardis	74fb5e6789	[mips] Guard some floating point instructions correctly Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47636 llvm-svn: 334491	2018-06-12 10:28:06 +00:00
Aleksandar Beserminji	8acdc10220	[mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with additional flag, so instead of always lowering to lui %hi(...), addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi, %higher or %highest depending on the added flag. Differential Revision: https://reviews.llvm.org/D47941 llvm-svn: 334490	2018-06-12 10:23:49 +00:00
Simon Pilgrim	cfd96329f0	[CostModel][X86] Add extra Identity shuffle mask cost tests (D47986) llvm-svn: 334486	2018-06-12 09:18:13 +00:00
Michael Berg	95f3a430a8	NFC, some additional tests added and some renaming for planned fma support changes llvm-svn: 334461	2018-06-12 00:52:43 +00:00
Craig Topper	957b738432	[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460	2018-06-12 00:48:57 +00:00
Mark Searles	987f292c56	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459	2018-06-12 00:41:26 +00:00
Wei Mi	a0c0857e7a	[SampleFDO] Add a new compact binary format for sample profile. Name table occupies a big chunk of size in current binary format sample profile. In order to reduce its size, the patch changes the sample writer/reader to save/restore MD5Hash of names in the name table. Sample annotation phase will also use MD5Hash of name to query samples accordingly. Experiment shows compact binary format can reduce the size of sample profile by 2/3 compared with binary format generally. Differential Revision: https://reviews.llvm.org/D47955 llvm-svn: 334447	2018-06-11 22:40:43 +00:00
Konstantin Zhuravlyov	3e5d66ac66	AMDGPU: Add 64-bit relative variant kind Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443	2018-06-11 21:37:57 +00:00
Matt Arsenault	5615fa0a87	DAG: Fix extract_subvector combine for a single element This would fail before because 1x vectors aren't legal, so instead just use the scalar type. Avoids regressions in a future AMDGPU commit to add v4i16/v4f16 as legal types. Test update is just the one test that this triggers on in tree now. It wasn't checking anything before. The result is completely changed since the selects are eliminated. Not sure if it's considered better or not. llvm-svn: 334440	2018-06-11 21:27:41 +00:00
Farhana Aleen	078cd48a39	[SLP] Add testcases of min/max reduction pattern for AMDGPU. Author: FarhanaAleen llvm-svn: 334435	2018-06-11 20:29:31 +00:00
Tim Shen	df2d6652c1	Fix incorrect CHECK-LABEL llvm-svn: 334434	2018-06-11 19:56:12 +00:00
Justin Lebar	4da41c13a5	[SCEV] Add transform zext((A * B * ...)<nuw>) --> (zext(A) * zext(B) * ...)<nuw>. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48041 llvm-svn: 334429	2018-06-11 18:57:58 +00:00
Justin Lebar	aa4fec94d8	[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe. Summary: Previously we would add them for adds, but not multiplies. Reviewers: sanjoy Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D48038 llvm-svn: 334428	2018-06-11 18:57:42 +00:00
Krzysztof Parzyszek	dd9415d550	[Hexagon] Late predicate producers cannot be used as dot-new sources llvm-svn: 334426	2018-06-11 18:45:52 +00:00
Tim Shen	cc63761720	[SCEV] Canonicalize "A /u C1 /u C2" to "A /u (C1C2)". Summary: FWIW InstCombine already folds this. Also avoid the case where C1C2 overflows. Reviewers: sunfish, sanjoy Subscribers: hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D47965 llvm-svn: 334425	2018-06-11 18:44:58 +00:00
Stanislav Mekhanoshin	7ba3fc730c	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420	2018-06-11 16:50:49 +00:00
Aleksandar Beserminji	62cf9d21ab	[mips] Fix spill slot for mips3, n64 abi When program is compiled for mips3 with n64 abi, wrong register class is used for creating an emergency spill slot. This patch fixes the correct register class to be chosen. This patch resolves PR35859. Thanks to John Baldwin for reporting the issue! Differential Revision: https://reviews.llvm.org/D47938 llvm-svn: 334419	2018-06-11 16:50:28 +00:00
Dylan McKay	d011869c82	[AVR] Set trackLivenessAfterRegAlloc This sets trackLivenessAfterRegAlloc on AVRRegisterInfo. Most existing targets set this flag. Without it, specific IR inputs cause LLVM to fail with: Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) && "Liveness information is accurate"), function livein_begin file MachineBasicBlock.cpp, line 1354. With this commit, this no longer happens. Patch by Peter Nimmervoll. llvm-svn: 334409	2018-06-11 14:46:48 +00:00
Clement Courbet	7db69cc08a	[X86] Fix skylake server scheduling info. Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407	2018-06-11 14:37:53 +00:00
Sanjay Patel	a1791be455	[x86] add scalar cvtt intrinsic tests; NFC More coverage for the problem noted in D47993 (although these shouldn't be affected by that patch). llvm-svn: 334404	2018-06-11 13:51:34 +00:00
Roman Lebedev	b896c4e860	[NFC][AMDGPU] Add tests for all the various IR patterns equivalent to extracting low bits. Summary: The idiom recognition seems rather poor. Only the `@bzhi32_d0` produces `v_bfe_u32`. But they all should. This needs to be fixed before D47980 can be re-landed. Reviewers: mareko, bogner, rampitec, arsenm, tstellar, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48005 llvm-svn: 334398	2018-06-11 10:21:10 +00:00
Craig Topper	5e403b2981	[X86] Add encoding tests for avx5124fmaps and avx5124vnni instructions. I forgot to git add these in r333812 llvm-svn: 334387	2018-06-11 06:22:41 +00:00
Craig Topper	c12ac0c984	[X86] Add test files for upgrade of vbmi2 expand load and compress store intrinsics that was done in r334381. llvm-svn: 334386	2018-06-11 06:20:24 +00:00
Craig Topper	0e25c8239a	[X86] Remove masking from dbpsadbw intrinsics, use select in IR instead. llvm-svn: 334384	2018-06-11 06:18:22 +00:00
Daniel Cederman	33f67a256b	[Sparc] Add support for 13-bit PIC Summary: When compiling with -fpic, in contrast to -fPIC, use only the immediate field to index into the GOT. This saves space if the GOT is known to be small. The linker will warn if the GOT is too large for this method. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47136 llvm-svn: 334383	2018-06-11 05:50:08 +00:00
Brock Wyma	b60532f89a	[CodeView] Omit forward references for unnamed structs and unions Codeview references to unnamed structs and unions are expected to refer to the complete type definition instead of a forward reference so Visual Studio can resolve the type properly. Differential Revision: https://reviews.llvm.org/D32498 llvm-svn: 334382	2018-06-11 01:39:34 +00:00
Craig Topper	e71ad1f6d0	[X86] Remove and autoupgrade the expandload and compressstore intrinsics. We use the target independent intrinsics now. llvm-svn: 334381	2018-06-11 01:25:22 +00:00
Sanjay Patel	3e5c70cc1d	[DAGCombiner] match vector compare and select sizes with extload operand (PR37427) This patch started off much more general and ambitious, but it's been a nightmare seeing all the ways x86 vector codegen can go wrong. So the code is still structured to allow extending easily, but it's currently limited in several ways: 1. Only handle cases with an extending load. 2. Only handle cases with a zero constant compare. 3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected. The motivating case from PR37427: https://bugs.llvm.org/show_bug.cgi?id=37427 ...is the 1st test, and that shows the expected win - we eliminated the unnecessary intermediate cast. There's a clear regression in the last test (sgt_zero_fp_select) because we longer recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit setcc from a sign-extended operand and remove it. Differential Revision: https://reviews.llvm.org/D47330 llvm-svn: 334378	2018-06-10 23:09:50 +00:00
Roman Lebedev	ebb3252f00	Revert rL334371 / D47980: "[InstCombine] Fold (x << y) >> y -> x & (-1 >> y)" test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll broke, and i did not notice because i did not build that backend. llvm-svn: 334373	2018-06-10 20:32:03 +00:00
Roman Lebedev	eb795a0661	[InstCombine] Fold (x >> y) << y -> x & (-1 << y) Summary: We already do it for matching splat constants, but not just values. Further improvements for non-matching splat constants, as noted in https://reviews.llvm.org/D46760#1123713 will be needed, but i'd prefer to do that as a follow-up. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX https://rise4fun.com/Alive/0HF Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47981 llvm-svn: 334372	2018-06-10 20:10:13 +00:00
Roman Lebedev	4cdc59ecf2	[InstCombine] Fold (x << y) >> y -> x & (-1 >> y) Summary: We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47980 llvm-svn: 334371	2018-06-10 20:10:06 +00:00
Roman Lebedev	1e9457ee76	[NFC][InstCombine] Revisit tests for D47980 / D47981 once more. llvm-svn: 334370	2018-06-10 20:10:00 +00:00
Craig Topper	304bd747af	[X86] Add expandload and compresstore fast-isel tests for avx512f and avx512vl. Update existing tests for avx512vbmi2 to use target independent intrinsics. llvm-svn: 334368	2018-06-10 18:55:37 +00:00
Sanjay Patel	15bee8f1c0	[x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC llvm-svn: 334367	2018-06-10 17:42:12 +00:00
Ivan A. Kosarev	847daa11f8	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361	2018-06-10 09:27:27 +00:00
Craig Topper	301d526329	[X86] Fix forward declaration in a test case that was messed up in r334358 llvm-svn: 334360	2018-06-10 06:43:48 +00:00
Craig Topper	98a79934af	[X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead. llvm-svn: 334358	2018-06-10 06:01:36 +00:00
Simon Pilgrim	5297506625	[CostModel][X86] Add 'select' style shuffle costs tests (PR33744) llvm-svn: 334351	2018-06-09 16:08:25 +00:00
Roman Lebedev	ff7652e7ab	[NFC][InstCombine] More tests for (x >> y) << y -> x & (-1 << y) fold. Followup for rL334347. The fold is also valid for ashr. https://rise4fun.com/Alive/0HF https://bugs.llvm.org/show_bug.cgi?id=37603 https://reviews.llvm.org/D46760#1123713 https://rise4fun.com/Alive/cplX llvm-svn: 334349	2018-06-09 14:01:46 +00:00
Roman Lebedev	8aada65f81	[NFC][InstCombine] Tests for (x >> y) << y -> x & (-1 << y) fold. We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. https://bugs.llvm.org/show_bug.cgi?id=37603 https://reviews.llvm.org/D46760#1123713 https://rise4fun.com/Alive/cplX llvm-svn: 334347	2018-06-09 09:27:43 +00:00
Roman Lebedev	794e29f964	[NFC][InstCombine] Tests for (x << y) >> y -> x & (-1 >> y) fold. We already do it for splat constants, but not just values. Also, undef cases are mostly non-functional. https://bugs.llvm.org/show_bug.cgi?id=37603 https://rise4fun.com/Alive/cplX llvm-svn: 334346	2018-06-09 09:27:39 +00:00
Eli Friedman	864df22307	[ARM] Allow CMPZ transforms even if the input has multiple uses. It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322	2018-06-08 21:16:56 +00:00
Krzysztof Parzyszek	b10ea39270	[SCEV] Look through zero-extends in howFarToZero An expression like (zext i2 {(trunc i32 (1 + %B) to i2),+,1}<%while.body> to i32) will become zero exactly when the nested value becomes zero in its type. Strip injective operations from the input value in howFarToZero to make the value simpler. Differential Revision: https://reviews.llvm.org/D47951 llvm-svn: 334318	2018-06-08 20:43:07 +00:00
Davide Italiano	189c2cf114	[InstCombine] Skip dbg.value(s) when looking at stack{save,restore}. Fixes PR37713. llvm-svn: 334317	2018-06-08 20:42:36 +00:00
Sanjay Patel	afcf39e1f9	[InstCombine] add llvm.assume + debuginfo test (PR37726); NFC llvm-svn: 334314	2018-06-08 18:47:33 +00:00
Reid Kleckner	0bab222084	[asan] Instrument comdat globals on COFF targets Summary: If we can use comdats, then we can make it so that the global metadata is thrown away if the prevailing definition of the global was uninstrumented. I have only tested this on COFF targets, but in theory, there is no reason that we cannot also do this for ELF. This will allow us to re-enable string merging with ASan on Windows, reducing the binary size cost of ASan on Windows. Reviewers: eugenis, vitalybuka Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47841 llvm-svn: 334313	2018-06-08 18:33:16 +00:00
Simon Pilgrim	5c32989c91	[X86][SSE] Support v8i16/v16i16 rotations Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together. Differential Revision: https://reviews.llvm.org/D47822 llvm-svn: 334309	2018-06-08 17:58:42 +00:00
Sanjay Patel	70314bd61c	[x86] add tests for node-level FMF; NFC These cases should be optimized using the change from D47911. llvm-svn: 334308	2018-06-08 17:54:28 +00:00
Sanjay Patel	9995a00a94	[x86] regenerate test checks; NFC llvm-svn: 334307	2018-06-08 17:42:35 +00:00
Michael Berg	bf90d1f263	Utilize new SDNode flag functionality to expand current support for fsub Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D47910 llvm-svn: 334306	2018-06-08 17:39:50 +00:00
Simon Pilgrim	89deac6694	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303	2018-06-08 17:00:45 +00:00
Simon Pilgrim	59e915c691	[X86] Fix schedule-x86_64.s tests to use different registers in reg-reg cases Same fix as rL334110: I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction schedule tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2. llvm-svn: 334302	2018-06-08 16:40:15 +00:00
Daniil Fukalov	c9a098b314	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301	2018-06-08 16:29:04 +00:00
Daniil Fukalov	37433dc2e1	reapply r334209 with fixes for harfbuzz in Chromium r334209 description: [LSR] Check yet more intrinsic pointer operands the patch fixes another assertion in isLegalUse() Differential Revision: https://reviews.llvm.org/D47794 llvm-svn: 334300	2018-06-08 16:22:52 +00:00
Roman Lebedev	b060ce45ca	[InstSimplify] add nuw %x, -1 -> -1 fold. Summary: `%ret = add nuw i8 %x, C` From [[ https://llvm.org/docs/LangRef.html#add-instruction \| langref ]]: nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs. So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`. I'm not sure we want to use `KnownBits`/`LVI` here, because there is exactly one possible value (all bits set, `-1`), so some other pass should take care of replacing the known-all-ones with constant `-1`. The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change is confusing. What happening is, before this: (omitting `nuw` for simplicity) 1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits` 2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`, 3. `-1` is inverted to `0`. But now: 1. This InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first, before InstCombine D47428/rL334127 fold could happen. Thus we now end up with the opposite constant, and it is all good: https://rise4fun.com/Alive/OA9 https://rise4fun.com/Alive/sldC Was mentioned in D47428 review. Follow-up for D47883. Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47908 llvm-svn: 334298	2018-06-08 15:44:47 +00:00
Simon Pilgrim	efb4806bb9	[X86][BtVer2] Remove SBB tests that were accidentally added in rL334296 These aren't true zero-idiom instructions (just dependency breaking). llvm-svn: 334297	2018-06-08 15:43:00 +00:00
Simon Pilgrim	53766a986d	[X86][BtVer2] Add tests for scalar SUB/XOR instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions). llvm-svn: 334296	2018-06-08 15:28:43 +00:00
Simon Pilgrim	aafcf9e4a1	[X86][BtVer2] Limit zero idiom tests to a single iteration. Reduces output size and we're only wanting to check that the instructions are fast-path'd (just Dispatch+Retire) anyhow llvm-svn: 334292	2018-06-08 15:01:40 +00:00
Simon Pilgrim	eab9d20424	[X86][SSE] Add SSE2/AVX2 vector rotate tests Now that we're custom lowering vector rotates for SSE in general we should be testing the combines with them as well. llvm-svn: 334290	2018-06-08 14:07:21 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00
Sanjay Patel	ab4ca0603c	[x86] restore test comment; NFC The description got deleted along with the FIXME note in rL334268. llvm-svn: 334288	2018-06-08 13:53:13 +00:00
Artur Pilipenko	4d063e7bb1	[BPI] Apply invoke heuristic before loop branch heuristic Currently the loop branch heuristic is applied before the invoke heuristic which makes us overestimate the probability of the unwind destination of invokes inside loops. This in turn makes us grossly underestimate the frequencies of loops with invokes. Reviewed By: skatkov, vsk Differential Revision: https://reviews.llvm.org/D47371 llvm-svn: 334285	2018-06-08 13:03:21 +00:00
Alex Bradbury	ed53ca73ec	[RISCV] Implement MC layer support for the fence.tso instruction The instruction makes use of a previously ignored field in the fence instruction. It is introduced in the version 2.3 draft of the RISC-V specification after much work by the Memory Model Task Group. As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>, the fence.tso assembler mnemonic does not have operands. llvm-svn: 334278	2018-06-08 10:39:05 +00:00
Simon Pilgrim	ad45efc445	[X86][SSE] Consistently prefer lowering to PACKUS over PACKSS We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276	2018-06-08 10:29:00 +00:00
Roman Shirokiy	9ba0aa2da0	[LV] Fix PR36983. For a given recurrence, fix all phis in exit block There could be more than one PHIs in exit block using same loop recurrence. Don't assume there is only one and fix each user. Differential Revision: https://reviews.llvm.org/D47788 llvm-svn: 334271	2018-06-08 08:21:20 +00:00
Matt Arsenault	6fc3759811	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269	2018-06-08 08:05:54 +00:00
Sam Parker	16f963ba0d	[DAGCombine] Fix for PR37667 While trying to propagate AND masks back to loads, we currently allow one non-load node to be included as a leaf in chain. This fix now limits that node to produce only a single data value. Differential Revision: https://reviews.llvm.org/D47878 llvm-svn: 334268	2018-06-08 07:49:04 +00:00
Reid Kleckner	a3609f75b2	Revert r334209 "[LSR] Check yet more intrinsic pointer operands" This causes cast failures when compiling harfbuzz in Chromium. Reproducer on the way. llvm-svn: 334254	2018-06-08 00:43:27 +00:00
Michael Berg	77b5be7ec6	propagate fast math flags via IR on fma and sub expressions Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode. Reviewers: spatel, arsenm, hfinkel, javed.absar Reviewed By: spatel Subscribers: nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47388 llvm-svn: 334242	2018-06-07 22:49:09 +00:00
Tony Tye	a5a7c331e7	[AMDGPU] Simplify memory legalizer - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334241	2018-06-07 22:28:32 +00:00
Roman Lebedev	188a619e56	[NFC][InstSimplify] Add tests for add nuw %x, -1 -> -1 fold. %ret = add nuw i8 %x, C From langref: nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”, respectively. If the nuw and/or nsw keywords are present, the result value of the add is a poison value if unsigned and/or signed overflow, respectively, occurs. So if C is -1, %x can only be 0, and the result is always -1. https://rise4fun.com/Alive/sldC Was mentioned in D47428 review. llvm-svn: 334236	2018-06-07 21:19:50 +00:00
Roman Lebedev	fdd90f2fc6	[NFC][InstSimplify] One more negative test for shl nuw C, %x -> C fold. Follow-up for rL334200, rL334206. llvm-svn: 334235	2018-06-07 21:19:45 +00:00
Roman Lebedev	2683802ba0	[InstSimplify] shl nuw C, %x -> C iff signbit is set on C. Summary: `%r = shl nuw i8 C, %x` As per langref: ``` If the nuw keyword is present, then the shift produces a poison value if it shifts out any non-zero bits. ``` Thus, if the sign bit is set on `C`, then `%x` can only be `0`, which means that `%r` can only be `C`. Or in other words, set sign bit means that the signed value is negative, so the constant is `<= 0`. https://rise4fun.com/Alive/WMk https://rise4fun.com/Alive/udv Was mentioned in D47428 review. We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants. Could use computeKnownBits() / LazyValueInfo, but the cost-benefit analysis (https://reviews.llvm.org/D47891) suggests it isn't worth it. Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47883 llvm-svn: 334222	2018-06-07 20:03:45 +00:00
Sanjay Patel	4b1205b40f	[TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls These weren't included in D19544 - probably just an oversight. D40044 made it more likely that we'll have LLVM math intrinsics rather than libcalls, so this bug was more easily exposed. As the tests/code show, we already have the complete mappings for pow/exp/log. I don't have any experience with SVML, so I don't know if anything else is missing. It's also not clear to me that we should be doing this transform in IR rather than DAG/isel, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D47610 llvm-svn: 334211	2018-06-07 18:21:24 +00:00
Daniil Fukalov	12c0663a25	[LSR] Check yet more intrinsic pointer operands the patch fixes another assertion in isLegalUse() Differential Revision: https://reviews.llvm.org/D47794 llvm-svn: 334209	2018-06-07 17:30:58 +00:00
Roman Lebedev	86d376f516	[NFC][InstSimplify] Add more tests for shl nuw C, %x -> C fold. Follow-up for rL334200. For these, KnownBits will be needed. llvm-svn: 334206	2018-06-07 16:18:26 +00:00
Alex Bradbury	6a4b5441e4	[RISCV] AsmParser support for the li pseudo instruction The implementation follows the MIPS backend and expands the pseudo instruction directly during asm parsing. As the result, only real MC instructions are emitted to the MCStreamer. The actual expansion to real instructions is similar to the expansion performed by the GNU Assembler. This patch supersedes D41949. Differential Revision: https://reviews.llvm.org/D46118 Patch by Mario Werner. llvm-svn: 334203	2018-06-07 15:35:47 +00:00
Roman Lebedev	847938925b	[NFC][InstSimplify] Add tests for shl nuw C, %x -> C fold. %r = shl nuw i8 C, %x As per langref: If the nuw keyword is present, then the shift produces a poison value if it shifts out any non-zero bits. Thus, if the sign bit is set on C, then %x can only be 0, which means that %r can only be C. https://rise4fun.com/Alive/WMk Was mentioned in D47428 review. llvm-svn: 334200	2018-06-07 14:18:38 +00:00
Sanjay Patel	898fbd7c47	[x86] add tests for backwards propagate mask bug (PR37060, PR37667); NFC llvm-svn: 334199	2018-06-07 14:11:18 +00:00
Hiroshi Inoue	01ef4c2c64	[PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation. However, enforcing 32-bit instruction sometimes results in redundant generated code. For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF). uint64_t func(uint64_t a, uint64_t b) { return (a & 0xFFFFFFFF) \| (b << 32) ; } To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group. Differential Revision: https://reviews.llvm.org/D47867 llvm-svn: 334195	2018-06-07 13:21:14 +00:00
Simon Pilgrim	09953d8412	[X86][SSE] Simplify combineVectorTruncationWithPACKSS to reduce code duplication Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code. llvm-svn: 334193	2018-06-07 13:01:42 +00:00
Matt Arsenault	f1c868ef08	AMDGPU: Fix not including v2f64 in SReg_128 Fixes assertion with calls returning v2f64. llvm-svn: 334189	2018-06-07 12:16:31 +00:00
Simon Pilgrim	0e29d8d81f	[X86][SSE] Add extra trunc(shl) test cases The existing trunc_shl_17_v8i16_v8i32 test case should (but doesn't) fold to zero, I've added 2 new test cases: - trunc_shl_16_v8i16_v8i32 which folds to zero (this is actually testing the target faux shuffle combine) - trunc_shl_15_v8i16_v8i32 which should perform the full shl + truncate llvm-svn: 334188	2018-06-07 11:22:52 +00:00
Florian Hahn	0d6b01761c	[Mem2Reg] Avoid replacing load with itself in promoteSingleBlockAlloca. We do the same thing in rewriteSingleStoreAlloca. Fixes PR37632. Reviewers: chandlerc, davide, efriedma Reviewed By: davide Differential Revision: https://reviews.llvm.org/D47825 llvm-svn: 334187	2018-06-07 11:09:05 +00:00
Matt Arsenault	697300bd4f	AMDGPU: Use scalar operations for f16 fabs/fneg patterns Fixes unnecessary differences between subtargets. llvm-svn: 334184	2018-06-07 10:15:20 +00:00
Simon Pilgrim	cc92897be9	[X86] Regenerate rotate tests Add 32-bit tests to show missed SHLD/SHRD cases llvm-svn: 334183	2018-06-07 10:13:09 +00:00
Paul Semel	e57bc78324	[llvm-strip] Expose --strip-unneeded option Differential Revision: https://reviews.llvm.org/D47818 llvm-svn: 334182	2018-06-07 10:05:25 +00:00
Matt Arsenault	90083d3088	AMDGPU: Try a lot harder to emit scalar loads This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180	2018-06-07 09:54:49 +00:00
Tomasz Krupa	f8c7637027	[X86] Block UndefRegUpdate Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update. Reviewers: craig.topper Subscribers: llvm-commits, mike.dvoretsky, ashlykov Differential Revision: https://reviews.llvm.org/D47621 llvm-svn: 334175	2018-06-07 08:48:45 +00:00
Karl-Johan Karlsson	abb11f805f	[BranchFolding] Fix live-in's when hoisting code Summary: When the branch folder hoist code into a predecessor it adjust live-in's in the blocks it hoist code from. However it fail to handle hoisted code that contain a defed register that originally is live-in in the block through a super register. This is fixed by replacing the live-in handling code with calls to utility functions in LivePhysRegs. Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47529 llvm-svn: 334163	2018-06-07 07:20:33 +00:00
Michael Zolotukhin	31800864dc	SpeculativeExecution Pass: Set PreserveCFG to avoid unnecessary analyses invalidation. The pass doesn't touch CFG in any way, only moves instructions between blocks. llvm-svn: 334150	2018-06-07 00:19:29 +00:00
Peter Collingbourne	cf017ada68	llvm-readobj: fix printing number of relocations in Android packed format. With '-elf-output-style=GNU -relocations', a header containing the number of entries is printed before all the relocation entries in the section. For Android packed format, we need to perform the unpacking first before we can get the actual number of relocations in the section. Patch by Rahul Chaudhry! Differential Revision: https://reviews.llvm.org/D47800 llvm-svn: 334147	2018-06-07 00:02:07 +00:00
Stanislav Mekhanoshin	df61be70b2	[AMDGPU] Improve reciprocal handling When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x = s; return s v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142	2018-06-06 22:22:32 +00:00
Sanjay Patel	3cd1aa88f9	[InstCombine] fold another shifty abs pattern to cmp+sel (PR36036) The bug report: https://bugs.llvm.org/show_bug.cgi?id=36036 ...requests a DAG change for this, but an IR canonicalization probably handles most cases. If we still want to match this pattern in the backend, there's a proposal for that too: D47831 Alive proofs including nsw/nuw cases that were first noted in: D46988 https://rise4fun.com/Alive/Kmp This patch is largely copied from the existing code that was initially added with: D40984 ...but I didn't see much gain from trying to share code. llvm-svn: 334137	2018-06-06 21:58:12 +00:00
Sanjay Patel	6fda6b1210	[InstCombine] add tests for another abs() pattern (PR36036); NFC llvm-svn: 334133	2018-06-06 21:32:42 +00:00
Matt Arsenault	e9524f1fb3	AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16 Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132	2018-06-06 21:28:11 +00:00
Alexander Shaposhnikov	29407f3abe	[llvm-strip] Expose --discard-all option Expose objcopy's --discard-all option in llvm-strip. Test plan: make check-all Differential revision: https://reviews.llvm.org/D47750 llvm-svn: 334131	2018-06-06 21:23:19 +00:00
Roman Lebedev	cbf8446359	[InstCombine] PR37603: low bit mask canonicalization Summary: This is [[ https://bugs.llvm.org/show_bug.cgi?id=37603 \| PR37603 ]]. https://godbolt.org/g/VCMNpS https://rise4fun.com/Alive/idM When doing bit manipulations, it is quite common to calculate some bit mask, and apply it to some value via `and`. The typical C code looks like: ``` int mask_signed_add(int nbits) { return (1 << nbits) - 1; } ``` which is translated into (with `-O3`) ``` define dso_local i32 @mask_signed_add(int)(i32) local_unnamed_addr #0 { %2 = shl i32 1, %0 %3 = add nsw i32 %2, -1 ret i32 %3 } ``` But there is a second, less readable variant: ``` int mask_signed_xor(int nbits) { return ~(-(1 << nbits)); } ``` which is translated into (with `-O3`) ``` define dso_local i32 @mask_signed_xor(int)(i32) local_unnamed_addr #0 { %2 = shl i32 -1, %0 %3 = xor i32 %2, -1 ret i32 %3 } ``` Since we created such a mask, it is quite likely that we will use it in `and` next. And then we may get rid of `not` op by folding into `andn`. But now that i have actually looked: https://godbolt.org/g/VTUDmU _some_ backend changes will be needed too. We clearly loose `bzhi` recognition. Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47428 llvm-svn: 334127	2018-06-06 19:38:27 +00:00
Roman Lebedev	4771bc6c35	[InstCombine][NFC] PR37603: low bit mask canonicalization tests Differential Revision: https://reviews.llvm.org/D47427 llvm-svn: 334126	2018-06-06 19:38:21 +00:00
Roman Lebedev	488d28d4e5	[X86] Emit BZHI when mask is ~(-1 << nbits)) Summary: In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation. As it is seen from these tests, there is a reason for that. AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!). The other way around for X86. It would be much better to canonicalize. This patch is completely monkey-typing. I don't really understand how this works :) I have based it on `// x & (-1 >> (32 - y))` pattern. Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ? Related links: https://bugs.llvm.org/show_bug.cgi?id=36419 https://bugs.llvm.org/show_bug.cgi?id=37603 https://bugs.llvm.org/show_bug.cgi?id=37610 https://rise4fun.com/Alive/idM Reviewers: craig.topper, spatel, RKSimon, javed.absar Reviewed By: craig.topper Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D47453 llvm-svn: 334125	2018-06-06 19:38:16 +00:00
Roman Lebedev	cb56f7a550	[NFC][X86][AArch64] Reorganize/cleanup BZHI test patterns Summary: In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation. As it is seen from these tests, there is a reason for that. AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!). The other way around for X86. It would be much better to canonicalize. It would seem that there is too much tests, but this is most of all the auto-generated possible variants of C code that one would expect for BZHI to be formed, and then manually cleaned up a bit. So this should be pretty representable, which somewhat good coverage... Related links: https://bugs.llvm.org/show_bug.cgi?id=36419 https://bugs.llvm.org/show_bug.cgi?id=37603 https://bugs.llvm.org/show_bug.cgi?id=37610 https://rise4fun.com/Alive/idM Reviewers: javed.absar, craig.topper, RKSimon, spatel Reviewed By: RKSimon Subscribers: kristof.beyls, llvm-commits, RKSimon, craig.topper, spatel Differential Revision: https://reviews.llvm.org/D47452 llvm-svn: 334124	2018-06-06 19:38:10 +00:00
Krzysztof Parzyszek	c1e712baa5	[Hexagon] Implement vector-pair zero as V6_vsubw_dv llvm-svn: 334123	2018-06-06 19:34:40 +00:00
Craig Topper	ef813a5226	[X86] Properly disassemble gather/scatter instructions where xmm4/ymm4/zmm4 are used as the index. These encodings correspond to the cases in the normal encoding scheme where there is no index and our modrm reading code initially decodes it as such. The VSIB handling code tried to compensate for this, but failed to add the base needed to make later code do the right thing. Fixes PR37712. llvm-svn: 334121	2018-06-06 19:15:15 +00:00
Simon Pilgrim	aef5bdbea1	[X86][BtVer2] Add support for all vector instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. llvm-svn: 334119	2018-06-06 19:06:09 +00:00
Vedant Kumar	6d354ed72e	[Debugify] Move debug value intrinsics closer to their operand defs Before this patch, debugify would insert debug value intrinsics before the terminating instruction in a block. This had the advantage of being simple, but was a bit too simple/unrealistic. This patch teaches debugify to insert debug values immediately after their operand defs. This enables better testing of the compiler. For example, with this patch, `opt -debugify-each` is able to identify a vectorizer DI-invariance bug fixed in llvm.org/PR32761. In this bug, the vectorizer produced different output with/without debug info present. Reverting Davide's bugfix locally, I see: $ ~/scripts/opt-check-dbg-invar.sh ./bin/opt \ .../SLPVectorizer/AArch64/spillcost-di.ll -slp-vectorizer Comparing: -slp-vectorizer .../SLPVectorizer/AArch64/spillcost-di.ll Baseline: /var/folders/j8/t4w0bp8j6x1g6fpghkcb4sjm0000gp/T/tmp.iYYeL1kf With DI : /var/folders/j8/t4w0bp8j6x1g6fpghkcb4sjm0000gp/T/tmp.sQtQSeet 9,11c9,11 < %5 = getelementptr inbounds %0, %0* %2, i64 %0, i32 1 < %6 = bitcast i64* %4 to <2 x i64>* < %7 = load <2 x i64>, <2 x i64>* %6, align 8, !tbaa !0 --- > %5 = load i64, i64* %4, align 8, !tbaa !0 > %6 = getelementptr inbounds %0, %0* %2, i64 %0, i32 1 > %7 = load i64, i64* %6, align 8, !tbaa !5 12a13 > store i64 %5, i64* %8, align 8, !tbaa !0 14,15c15 < %10 = bitcast i64* %8 to <2 x i64>* < store <2 x i64> %7, <2 x i64>* %10, align 8, !tbaa !0 --- > store i64 %7, i64* %9, align 8, !tbaa !5 :: Found a test case ^ Running this over the *.ll files in tree, I found four additional examples which compile differently with/without DI present. I plan on filing bugs for these. llvm-svn: 334118	2018-06-06 19:05:42 +00:00
Vedant Kumar	a9e27312b8	[Debugify] Add a quiet mode to suppress warnings Suppressing warning output and module dumps significantly speeds up fuzzing with `opt -debugify-each`. llvm-svn: 334117	2018-06-06 19:05:41 +00:00
Evandro Menezes	b2c8244715	[AArch64, ARM] Add support for Samsung Exynos M4 Create a separate feature set for Exynos M4 and add test cases. llvm-svn: 334115	2018-06-06 18:56:00 +00:00
Han Shen	2c5d2ea8a6	Fix the test case that places intermediate in source directory. This causes "permission denied" error in some controlled test environment where source tree is read-only. Differential Revision: https://reviews.llvm.org/D47839 llvm-svn: 334114	2018-06-06 18:53:17 +00:00
Michael Berg	cc1c4b6912	guard fsqrt with fmf sub flags Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. It contains only context for fsqrt. Reviewers: spatel, hfinkel, arsenm Reviewed By: spatel Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai Differential Revision: https://reviews.llvm.org/D47749 llvm-svn: 334113	2018-06-06 18:47:55 +00:00
Simon Pilgrim	7a48bb6e44	[llvm-mca][x86] Fix all resources-x86_64.s tests to use different registers in reg-reg cases I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2. Fixed in all targets to keep them in sync. llvm-svn: 334110	2018-06-06 18:20:25 +00:00
Krzysztof Parzyszek	0da1fe3770	[Hexagon] Split CTPOP of vector pairs llvm-svn: 334109	2018-06-06 18:03:29 +00:00
Sanjay Patel	0e8b90da0c	[ConstProp] move tests for fp <--> int; NFC These were added for D5603 / rL219542, and there's a proposal to change one side in D47807. These are tests of constant propagation, so they shouldn't have ever been tested/housed under InstCombine. llvm-svn: 334107	2018-06-06 16:53:56 +00:00
Simon Pilgrim	64541ff297	[X86][BtVer2] Add tests for all vector instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. TODO: Scalar instructions still need to be tested (need to check EFLAGS handling). llvm-svn: 334104	2018-06-06 16:14:37 +00:00
David Green	25312b2b6c	[GlobalMerge] Set the alignment on merged global structs If no alignment is set, the abi/preferred alignment of structs will be used which may be higher than required. This can lead to extra padding and in the end an increase in data size. Differential Revision: https://reviews.llvm.org/D47633 llvm-svn: 334099	2018-06-06 14:48:32 +00:00
Simon Dardis	9b1182acf4	[mips] Add testcase for i64, i128 addition for the DSP ASE llvm-svn: 334094	2018-06-06 13:30:39 +00:00
Tim Northover	9b80060d7b	InstCombine: ignore debug instructions during fence combine We should never get different CodeGen based on whether the code is being compiled in debug mode so we must skip over @llvm.dbg.value (and similar) calls. Should fix at least the worst part of PR37690. llvm-svn: 334090	2018-06-06 12:46:02 +00:00
Simon Dardis	0bba0df896	[mips] Partially revert r334031 The test changes in r334031 give unstable pass/fail results on the llvm-clang-x86_64-expensive-checks-win buildbot. Revert the test changes to turn the bot green. llvm-svn: 334084	2018-06-06 10:54:30 +00:00
Simon Pilgrim	3d14158891	[X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042) Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH). Differential Revision: https://reviews.llvm.org/D47690 llvm-svn: 334083	2018-06-06 10:52:10 +00:00
Peter Smith	57f661bd7d	[MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078	2018-06-06 09:40:06 +00:00
Petar Jovanovic	326ec32403	[MIPS GlobalISel] Add lowerCall Add minimal support to lower function calls. Support only functions with arguments/return that go through registers and have type i32. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D45627 llvm-svn: 334071	2018-06-06 07:24:52 +00:00
Sanjay Patel	59313be8d3	[CodeGen] assume max/default throughput for unspecified instructions This is a fix for the problem arising in D47374 (PR37678): https://bugs.llvm.org/show_bug.cgi?id=37678 We may not have throughput info because it's not specified in the model or it's not available with variant scheduling, so assume that those instructions can execute/complete at max-issue-width. Differential Revision: https://reviews.llvm.org/D47723 llvm-svn: 334055	2018-06-05 23:34:45 +00:00
Guozhi Wei	c4c6b548c5	[CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions. Differential Revision: https://reviews.llvm.org/D45537 This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem. llvm-svn: 334049	2018-06-05 21:03:52 +00:00
Matt Arsenault	57e541e87e	AMDGPU: Preserve metadata when widening loads Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. llvm-svn: 334045	2018-06-05 19:52:56 +00:00
Matt Arsenault	9224c00d2b	AMDGPU: Use more custom insert/extract_vector_elt lowering Apply to i8 vectors. llvm-svn: 334044	2018-06-05 19:52:46 +00:00
Krzysztof Parzyszek	b984ffcc71	[Hexagon] Add pattern to generate 64-bit neg instruction llvm-svn: 334043	2018-06-05 19:52:39 +00:00
Krzysztof Parzyszek	d8b093efef	[Hexagon] Add more patterns for generating abs/absp instructions llvm-svn: 334038	2018-06-05 19:00:50 +00:00
Michael Berg	96925fe0df	guard fneg with fmf sub flags Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. Reviewers: spatel, hfinkel Reviewed By: spatel Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D47389 llvm-svn: 334037	2018-06-05 18:49:47 +00:00
Michael Berg	8f6d6c817d	NFC: adding baseline fneg case for fmf llvm-svn: 334035	2018-06-05 18:12:25 +00:00
Simon Dardis	0d95ff03f2	[mips] Fix the predicates for arithmetic operations Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47635 llvm-svn: 334031	2018-06-05 17:53:22 +00:00
Andrea Di Biagio	757600bccb	[llvm-mca] Correctly update the CyclesLeft of a register read in the presence of partial register updates. This patch fixe the logic in ReadState::cycleEvent(). That method was not correctly updating field `TotalCycles`. Added extra code comments in class ReadState to better describe each field. llvm-svn: 334028	2018-06-05 17:12:02 +00:00
Simon Pilgrim	f2f043acbb	[X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets. Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023	2018-06-05 15:17:39 +00:00
Nirav Dave	05b589101e	[MC][X86] Allow assembler variable assignment to register name. Summary: Allow extended parsing of variable assembler assignment syntax and modify X86 to permit VAR = register assignment. As we emit these as .set directives when possible, we inline such expressions in output assembly. Fixes PR37425. Reviewers: rnk, void, echristo Reviewed By: rnk Subscribers: nickdesaulniers, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D47545 llvm-svn: 334022	2018-06-05 15:13:39 +00:00
Matt Arsenault	191bc71541	DAG: Stop dropping invariant/dereferencable When legalizing illegal FP load results, this was for some reason dropping the invariant and dereferencable memory flags. There doesn't seem to be any reason for this, and the equivalent isn't done for integer loads. Fixes an issue in a future AMDGPU commit where some identical loads fail to merge because one of the loads ends up dropping the flags. llvm-svn: 334020	2018-06-05 14:52:24 +00:00
John Brawn	e4ff0bd401	[InstCombine] Correct the cmp operand type used when canonicalizing abs/nabs When adjusting a cmp in order to canonicalize an abs/nabs select pattern we need to use the type of the existing operand when creating a new operand not the type of a select operand, as the two may be different. This fixes PR37686. llvm-svn: 334019	2018-06-05 14:10:55 +00:00
Hiroshi Inoue	955655f558	[PowerPC] reduce rotate in BitPermutationSelector BitPermutationSelector builds the output value by repeating rotate-and-mask instructions with input registers. Here, we may avoid one rotate instruction if we start building from an input register that does not require rotation. For example of the test case bitfieldinsert.ll, it first rotates left r4 by 8 bits and then inserts some bits from r5 without rotation. This can be executed by one rlwimi instruction, which rotates r4 by 8 bits and inserts its bits into r5. This patch adds a check for rotation amounts in the comparator used in sorting to process the input without rotation first. Differential Revision: https://reviews.llvm.org/D47765 llvm-svn: 334011	2018-06-05 11:58:01 +00:00
Simon Pilgrim	fef9b6eea6	[X86][SSE] Add target shuffle support to X86TargetLowering::computeKnownBitsForTargetNode Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but: (a) that code path doesn't handle general/pre-legalized ops/types very well. (b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow. llvm-svn: 334007	2018-06-05 10:52:29 +00:00
Peter Smith	ef945b2240	[MC][ARM] Add range checking for Thumb2 resolved fixups. When the branch target of a Thumb2 unconditional or conditonal branch is resolved at assembly time, no range checking is performed on the result leading to incorrect immediates. This change adds a range check: +- 16 Megabytes for unconditional branches, +- 1 Megabyte for the conditional branch. Differential Revision: https://reviews.llvm.org/D46306 llvm-svn: 333997	2018-06-05 10:00:56 +00:00
Simon Pilgrim	7bbe7a2920	[X86][SSE] Add basic PACKUS support to X86TargetLowering::computeKnownBitsForTargetNode Helps improve analysis of saturation ops llvm-svn: 333995	2018-06-05 09:45:03 +00:00
Peter Smith	0aafe0cee5	[MC][ARM] Correct Thumb BL instruction range The Thumb BL range is + or - either 16 Megabytes or 4 Megabytes depending on whether the CPU supports Thumb2 or the v8-m baseline ops. The existing check for BL range is incorrectly set at +- 32 Megabytes. This change corrects the higher range and uses the lower range if the featurebits don't have the necessary support for it. Differential Revision: https://reviews.llvm.org/D46305 llvm-svn: 333991	2018-06-05 09:32:28 +00:00
Alexander Ivchenko	964b27fa21	[X86][CET] Shadow stack fix for setjmp/longjmp This is the new version of D46181, allowing setjmp/longjmp to work correctly with the Intel CET shadow stack by storing SSP on setjmp and fixing it on longjmp. The patch has been updated to use the cf-protection-return module flag instead of HasSHSTK, and the bug that caused D46181 to be reverted has been fixed with the test expanded to track that fix. patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D47311 llvm-svn: 333990	2018-06-05 09:22:30 +00:00
Craig Topper	f17b33d6c6	[X86] Make all instructions that operate on MMX types, but were added after the initial MMX support via one of the SSE features flags make them require the MMX feature as well. Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately. llvm-svn: 333984	2018-06-05 06:20:06 +00:00
Vedant Kumar	b6ed992de0	[opt] Introduce -strip-named-metadata This renames and generalizes -strip-module-flags to erase all named metadata from a module. This makes it easier to diff IR. llvm-svn: 333977	2018-06-05 00:56:08 +00:00
Vedant Kumar	800255f9f1	[Debugify] Don't insert debug values after terminating deopts As is the case with musttail calls, the IR does not allow for instructions inserted after a terminating deopt. llvm-svn: 333976	2018-06-05 00:56:07 +00:00
Francis Visoiu Mistrih	ca69b3bf6d	[ShrinkWrap] Add optimization remarks to the shrink-wrapping pass Start by emitting remarks for very basic unsupported cases such as irreducible CFGs and EHFunclets. The end goal is to be able to cover all the cases where we give up with an explanation. llvm-svn: 333972	2018-06-05 00:27:24 +00:00
Amara Emerson	d496cc8ffb	[MIRParser] Add parser support for 'true' and 'false' i1s. We already output true and false in the printer, but the parser isn't able to read it. Differential Revision: https://reviews.llvm.org/D47424 llvm-svn: 333970	2018-06-05 00:17:13 +00:00
Sanjay Patel	dcb8d304c3	[InstCombine] refine UB-handling in shuffle-binop transform As noted in rL333782, we can be both better for optimization and safer with this transform: BinOp (shuffle V1, Mask), C --> shuffle (BinOp V1, NewC), Mask The only potentially unsafe-to-speculate binops are integer div/rem. All other binops are always safe (although I don't see a way to assert that in code here). For opcodes like shifts that can produce poison, it can't matter here because we know the lanes with undef are dropped by the subsequent shuffle. Differential Revision: https://reviews.llvm.org/D47686 llvm-svn: 333962	2018-06-04 22:26:45 +00:00
Amaury Sechet	800ac42573	Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC llvm-svn: 333961	2018-06-04 22:09:26 +00:00

1 2 3 4 5 ...

53803 Commits