llvm-project

Commit Graph

Author	SHA1	Message	Date
Neil Henning	0799352026	[AMDGPU] Fix a weird WWM intrinsic issue. I found a really strange WWM issue through a very convoluted shader that essentially boils down to a bug in SIInstrInfo where canReadVGPR did not correctly identify that WWM is like a copy and can have a VGPR as its source. Differential Revision: https://reviews.llvm.org/D56002 llvm-svn: 352500	2019-01-29 14:28:17 +00:00
Dan Gohman	4684f824d4	[WebAssembly] Re-enable main-function signature rewriting Re-enable the code to rewrite main-function signatures into "int main(int argc, char argv[])", but limited to only handling the case of "int main(void)", so that it doesn't silently strip an argument in the "int main(int argc, char argv[], char *envp[])" case. This allows main to be called by C startup code, since WebAssembly requires caller and callee signatures to match, so it can't rely on passing main a different number of arguments than it expects. Differential Revision: https://reviews.llvm.org/D57323 llvm-svn: 352479	2019-01-29 10:53:42 +00:00
David Green	54b0115547	[ARM] Use sub for negative offset load/store in thumb1 This attempts to optimise negative values used in load/store operands a little. We currently try to selct them as rr, materialising the negative constant using a MOV/MVN pair. This instead selects ri with an immediate of 0, forcing the add node to become a simpler sub. Differential Revision: https://reviews.llvm.org/D57121 llvm-svn: 352475	2019-01-29 10:40:31 +00:00
Martin Storsjo	f5884d255e	[COFF, ARM64] Don't put jump table into a separate COFF section for EK_LabelDifference32 Windows ARM64 has PIC relocation model and uses jump table kind EK_LabelDifference32. This produces jump table entry as ".word LBB123 - LJTI1_2" which represents the distance between the block and jump table. A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup correctly if they are in different COFF section. This change saves the jump table to the same COFF section as the associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32 relocation type. Patch by Tom Tan! Differential Revision: https://reviews.llvm.org/D57277 llvm-svn: 352465	2019-01-29 09:36:48 +00:00
Mikael Holmen	b792627ce9	Fix compiler warning when using clang 3.6.0 Without the fix we get the following (with -Werror): ../lib/Target/X86/X86ISelLowering.cpp:14181:58: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] SmallVector<std::array<int, 2>, 2> LaneSrcs(NumLanes, {-1, -1}); ^~~~~~ { } 1 error generated. llvm-svn: 352455	2019-01-29 06:51:28 +00:00
Sam Clegg	b54927cc48	[WebAssembly] Handle more types of uses in WebAssemblyAddMissingPrototypes Previously we were only handling bitcast operations, however prototypeless functions can also appear in other places such as comparisons and as function params. Switch to using replaceAllUsesWith() to replace the prototype-less function uses. This new approach results in some redundant bitcasting but is much simpler and handles all cases. Differential Revision: https://reviews.llvm.org/D56938 llvm-svn: 352445	2019-01-29 00:30:46 +00:00
Reid Kleckner	85e72c3d56	[PPC] Include tablegenerated PPCGenCallingConv.inc once Move the CC analysis implementation to its own .cpp file instead of duplicating it and artificually using functions in PPCISelLowering.cpp and PPCFastISel.cpp. Follow-up to the same change done for X86, ARM, and AArch64. llvm-svn: 352444	2019-01-29 00:30:35 +00:00
Thomas Lively	33f87b8aef	[WebAssembly] Expand BUILD_PAIR nodes Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D57276 llvm-svn: 352442	2019-01-28 23:44:31 +00:00
Craig Topper	390ac61b93	Recommit r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer" This did not cause the buildbot failure it was previously reverted for. Original commit message: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inre On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. llvm-svn: 352433	2019-01-28 21:38:47 +00:00
Reid Kleckner	27fd307b83	[ARM] Deduplicate table generated CC analysis code Create ARMCallingConv.cpp and emit code for calling convention analysis from there. llvm-svn: 352431	2019-01-28 21:28:43 +00:00
Reid Kleckner	96c581d7d0	[AArch64] Include AArch64GenCallingConv.inc once Summary: Avoids duplicating generated static helpers for calling convention analysis. This also means you can modify AArch64CallingConv.td without recompiling the AArch64ISelLowering.cpp monolith, so it provides faster incremental rebuilds. Saves 12K in llc.exe, but adds a new object file, which is large. Reviewers: efriedma, t.p.northover Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56948 llvm-svn: 352430	2019-01-28 21:28:40 +00:00
Jessica Paquette	2d73ecd0a3	[GlobalISel][AArch64] Add legalization for G_FLOG This adds support for legalizing G_FLOG into a RTLib call. It adds a legalizer test, and updates the existing floating point tests. https://reviews.llvm.org/D57347 llvm-svn: 352429	2019-01-28 21:27:23 +00:00
Matt Arsenault	cdd191d9db	AMDGPU: Add DS append/consume intrinsics Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. llvm-svn: 352422	2019-01-28 20:14:49 +00:00
Jessica Paquette	c49428a97d	[GlobalISel][AArch64] Add instruction selection support for @llvm.log10 This adds instruction selection support for @llvm.log10 in AArch64. It teaches GISel to lower it to a library call, updates the relevant tests, and adds a legalizer test for log10. https://reviews.llvm.org/D57341 llvm-svn: 352418	2019-01-28 19:53:14 +00:00
Francis Visoiu Mistrih	556ea7d2e0	[AArch64] Add 'apple-latest' CPU alias The 'apple-latest' alias is supposed to provide a CPU that contains the latest Apple processor model supported by LLVM. This is supposed to be used by tools like lldb to provide a target that supports most of the CPU features. For now, this is mapped to Cyclone. Differential Revision: https://reviews.llvm.org/D56384 llvm-svn: 352412	2019-01-28 19:27:33 +00:00
Nikita Popov	8e1a464e6a	[CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409	2019-01-28 19:19:09 +00:00
Jessica Paquette	7db82d7257	[GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN This contains all of the legalizer changes from D57197 necessary to select G_FCOS and G_FSIN. It also updates several existing IR tests in test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN instructions. https://reviews.llvm.org/D57197 3/3 llvm-svn: 352402	2019-01-28 18:34:18 +00:00
Simon Pilgrim	2c17512456	[X86][AVX] Remove lowerShuffleByMerging128BitLanes 2-lane restriction First step towards adding support for 64-bit unary "sublane" handling (a bit like lowerShuffleAsRepeatedMaskAndLanePermute). This allows us to add lowerV64I8Shuffle handling. llvm-svn: 352389	2019-01-28 17:02:35 +00:00
Sanjay Patel	94cca60b82	[x86] allow more shuffle splitting to avoid vpermps (PR40434) This is tricky to make optimal: sometimes we're better off using a single wider op, but other times it makes more sense to combine a narrow ops to achieve the same result. This solves the case from: https://bugs.llvm.org/show_bug.cgi?id=40434 There's potentially a similar change for vectors with 64-bit elements, but it needs adjustments similar to rL352333 to avoid creating infinite loops. llvm-svn: 352380	2019-01-28 15:51:34 +00:00
Arnaud A. de Grandmaison	51eb87cadd	Remove no longer needed Arm specific LICENSE.TXT file. As the codebase is now under the Apache 2.0 license with LLVM Exceptions, and all Arm's contributions, past or future, are under that new license, this Arm specific LICENSE.TXT is no longer needed, thus removing it. llvm-svn: 352376	2019-01-28 15:38:01 +00:00
Aleksandar Beserminji	6c5dfcb89e	[mips] Support for +abs2008 attribute Instruction abs.[ds] is not generating correct result when working with NaNs for revisions prior mips32r6 and mips64r6. To generate a sequence which always produce a correct result, but also to allow user more control on how his code is compiled, attribute +abs2008 is added, so user can choose legacy or 2008. By default legacy mode is used on revisions prior R6. Mips32r6 and mips64r6 use abs2008 mode by default. Differential Revision: https://reviews.llvm.org/D35983 llvm-svn: 352370	2019-01-28 14:59:30 +00:00
Tim Corringham	824ca3f3dd	[AMDGPU] Add intrinsics for 16 bit interpolation Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357	2019-01-28 13:48:59 +00:00
Petar Avramovic	7cecadb9af	[MIPS GlobalISel] Select sub Lower G_USUBO and G_USUBE. Add narrowScalar for G_SUB. Legalize and select G_SUB for MIPS 32. Differential Revision: https://reviews.llvm.org/D53416 llvm-svn: 352351	2019-01-28 12:10:17 +00:00
Diana Picus	574e0c5e32	[ARM GlobalISel] Support integer division for Thumb2 Support G_SDIV, G_UDIV, G_SREM and G_UREM. The only significant difference between arm and thumb mode is that we need to check a different subtarget feature. llvm-svn: 352346	2019-01-28 10:37:30 +00:00
Craig Topper	453150bc18	[X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument. Remove and autoupgrade the old intrinsics llvm-svn: 352343	2019-01-28 07:03:03 +00:00
Amara Emerson	fd31bf95c1	[AArch64][GlobalISel] Teach RBS about G_FNEG default mapping. llvm-svn: 352340	2019-01-28 03:21:14 +00:00
Amara Emerson	0bfa2faccc	[AArch64][GlobalISel] Add some missing vector support for FP arithmetic ops. Moved the fneg lowering legalization test from AArch64 to X86, as we want to specify that it's already legal. llvm-svn: 352338	2019-01-28 02:28:22 +00:00
Amara Emerson	92ffb305cc	[AArch64][GlobalISel] Add some vector support for fp <-> int conversions. Some unrelated, but benign, test changes as well due to the test update script. llvm-svn: 352337	2019-01-28 02:27:59 +00:00
Sanjay Patel	ebe6b43aec	[x86] add restriction for lowering to vpermps This transform was added with rL351346, and we had an escape for shufps, but we also want one for unpckps vs. vpermps because vpermps doesn't take an immediate shuffle index operand. llvm-svn: 352333	2019-01-27 21:53:33 +00:00
Simon Pilgrim	670a6971f8	[X86][SSE] Add UNDEF handling to combineSelect ISD::USUBSAT matching (PR40083) llvm-svn: 352330	2019-01-27 21:01:23 +00:00
Simon Pilgrim	f10b6623cc	[X86][SSE] Permit UNDEFs in combineAddToSUBUS matching (PR40083) llvm-svn: 352328	2019-01-27 20:36:37 +00:00
Sanjay Patel	5f1fdaa192	[x86] refactor logic in lowerShuffleWithUndefHalf Although this is longer code, this is no-functional-change-intended. The goal is to untangle the conditions under which we bail out, so that's easier to adjust. llvm-svn: 352320	2019-01-27 18:12:03 +00:00
Gabor Buella	a0f743b77a	[X86] Add some missing blsr patterns The add+and sequence followed by a branch can happen e.g. when looping over the set bits of an integer: ``` while (x != 0) { func(x & ~x); x &= x - 1; } ``` Reviewed By: ctopper Differential Revision: https://reviews.llvm.org/D57296 llvm-svn: 352306	2019-01-27 06:15:39 +00:00
Craig Topper	e65d4c5525	[X86] Add a pattern for (i64 (and (anyext def32:), 0x00000000FFFFFFFF)) to produce SUBREG_TO_REG def32 here means the producing instruction zeroed bits 63:32. We already do this for zext, but it looks like we can get an and+anyext sometimes. Spotted in the diffs from D33587. llvm-svn: 352303	2019-01-27 03:37:05 +00:00
Matt Arsenault	211e89d4dd	GlobalISel: Implement narrowScalar for mul llvm-svn: 352300	2019-01-27 00:52:51 +00:00
Matt Arsenault	2e5f900849	GlobalISel: fewerElementsVector for intrinsic_trunc/intrinsic_round llvm-svn: 352298	2019-01-27 00:12:21 +00:00
Matt Arsenault	ded2f82662	AMDGPU/GlobalISel: Use scalarize instead of clampMaxNumElements llvm-svn: 352297	2019-01-26 23:54:53 +00:00
Matt Arsenault	26a6c74fbe	AMDGPU/GlobalISel: Legalize more bit ops llvm-svn: 352295	2019-01-26 23:47:07 +00:00
Matt Arsenault	4d47594fc5	AMDGPU/GlobalISel: Widen small uaddo/usubo llvm-svn: 352294	2019-01-26 23:44:51 +00:00
Simon Pilgrim	a914fa4dd8	[X86] combineAddOrSubToADCOrSBB/combineCarryThroughADD - use oneuse for entire SDNode Fix issue noted in D57281 that only tested the one use for the SDValue (the result flag), not the entire SUB. I've added the getNode() to make it clearer what is intended than just the -> redirection. llvm-svn: 352291	2019-01-26 21:29:16 +00:00
Simon Pilgrim	37a8e65a60	[X86] combineCarryThroughADD - add support for X86::COND_A commutations (PR24545) As discussed on PR24545, we should try to commute X86::COND_A 'icmp ugt' cases to X86::COND_B 'icmp ult' to more optimally bind the carry flag output to a SBB instruction. Differential Revision: https://reviews.llvm.org/D57281 llvm-svn: 352289	2019-01-26 20:23:04 +00:00
Simon Pilgrim	b7a15acd38	[X86] Fold X86ISD::SBB(ISD::SUB(X,Y),0) -> X86ISD::SBB(X,Y) (PR25858) We often generate X86ISD::SBB(X, 0) for carry flag arithmetic. I had tried to create test cases for the ADC equivalent (which often uses the same pattern) but haven't managed to find anything yet. Differential Revision: https://reviews.llvm.org/D57169 llvm-svn: 352288	2019-01-26 20:13:44 +00:00
Simon Pilgrim	6162fba57c	[X86][SSE] Generalized unsigned compares to support nonsplat constant vectors (PR39859) llvm-svn: 352283	2019-01-26 16:40:03 +00:00
Sanjay Patel	a03c63b77f	[x86] add helper for creating a half-width shuffle; NFC This reduces a bit of duplication between the combining and lowering places that use it, but the primary motivation is to make it easier to rearrange the lowering logic and solve PR40434: https://bugs.llvm.org/show_bug.cgi?id=40434 llvm-svn: 352280	2019-01-26 16:20:22 +00:00
Craig Topper	3b5e01b386	[X86] Remove and autoupgrade vpconflict intrinsics that take a mask and passthru argument. We have unmasked versions as of r352172 llvm-svn: 352270	2019-01-26 06:27:01 +00:00
Craig Topper	58e6b37e62	Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer" This might be breaking an lldb windows buildbot. llvm-svn: 352268	2019-01-26 02:44:58 +00:00
Craig Topper	6c9c7d0796	[X86] Remove GCCBuiltins from 512-bit cvt(u)qqtops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics. Add new variadic uitofp/sitofp with rounding mode intrinsics. Summary: See clang patch D56998 for a full description. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56999 llvm-svn: 352266	2019-01-26 02:41:54 +00:00
Thomas Lively	2b8b2978e4	[WebAssembly][NFC] Group SIMD-related ISel configuration Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish Differential Revision: https://reviews.llvm.org/D57263 llvm-svn: 352262	2019-01-26 01:25:37 +00:00
Nemanja Ivanovic	7d007ddedf	[PowerPC] Update Vector Costs for P9 For the power9 CPU, vector operations consume a pair of execution units rather than one execution unit like a scalar operation. Update the target transform cost functions to reflect the higher cost of vector operations when targeting Power9. Patch by RolandF. Differential revision: https://reviews.llvm.org/D55461 llvm-svn: 352261	2019-01-26 01:18:48 +00:00
Craig Topper	7a8e74775c	[X86] Add DAG combine to merge vzext_movl with the various fp<->int conversion operations that only write the lower 64-bits of an xmm register and zero the rest. Summary: We have isel patterns for this, but we're missing some load patterns and all broadcast patterns. A DAG combine seems like a better fit for this. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56971 llvm-svn: 352260	2019-01-26 01:17:09 +00:00
Craig Topper	b1d3457c03	[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the type legalizer Summary: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization. On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57186 llvm-svn: 352255	2019-01-26 00:26:37 +00:00
Alex Bradbury	0092df0669	[RISCV] Add target DAG combine for bitcast fabs/fneg on RV32FD DAGCombiner::visitBITCAST will perform: fold (bitconvert (fneg x)) -> (xor (bitconvert x), signbit) fold (bitconvert (fabs x)) -> (and (bitconvert x), (not signbit)) As shown in double-bitmanip-dagcombines.ll, this can be advantageous. But RV32FD doesn't use bitcast directly (as i64 isn't a legal type), and instead uses RISCVISD::SplitF64. This patch adds an equivalent DAG combine for SplitF64. llvm-svn: 352247	2019-01-25 21:55:48 +00:00
Mircea Trofin	519f42d914	[llvm] Opt-in flag for X86DiscriminateMemOps Summary: Currently, if an instruction with a memory operand has no debug information, X86DiscriminateMemOps will generate one based on the first line of the enclosing function, or the last seen debug info. This may cause confusion in certain debugging scenarios. The long term approach would be to use the line number '0' in such cases, however, that brings in challenges: the base discriminator value range is limited (4096 values). For the short term, adding an opt-in flag for this feature. See bug 40319 (https://bugs.llvm.org/show_bug.cgi?id=40319) Reviewers: dblaikie, jmorse, gbedwell Reviewed By: dblaikie Subscribers: aprantl, eraman, hiraditya Differential Revision: https://reviews.llvm.org/D57257 llvm-svn: 352246	2019-01-25 21:49:54 +00:00
Jessica Paquette	1f9bc2854f	[GlobalISel][AArch64][NFC] Fix incorrect comment in selectUnmergeValues s/scalar/vector/ llvm-svn: 352243	2019-01-25 21:28:27 +00:00
Ana Pazos	05a6064385	Reapply: [RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUI This reapplies commit r352010 with RISC-V test fixes. llvm-svn: 352237	2019-01-25 20:22:49 +00:00
Craig Topper	4cf28bad5b	[X86] Combine masked store and truncate into masked truncating stores. We also need to combine to masked truncating with saturation stores, but I'm leaving that for a future patch. This does regress some tests that used truncate wtih saturation followed by a masked store. Those now use a truncating store and use min/max to saturate. Differential Revision: https://reviews.llvm.org/D57218 llvm-svn: 352230	2019-01-25 18:37:36 +00:00
Sanjay Patel	0020f8bb23	[x86] simplify logic in lowerShuffleWithUndefHalf(); NFCI This seems unnecessarily complicated because we gave names to opposite polarity bools and have code comments that don't really line up with the logic. Step 1: remove UndefUpper and assert that it is the opposite of UndefLower after the initial early exit. llvm-svn: 352217	2019-01-25 17:00:41 +00:00
Simon Pilgrim	f56298f4b9	[X86] Simplify X86ISD::ADD/SUB if we don't use the result flag Simplify to the generic ISD::ADD/SUB if we don't make use of the result flag. This mainly helps with ADDCARRY/SUBBORROW intrinsics which get expanded to X86ISD::ADD/SUB but could be simplified further. Noticed in some of the test cases in PR31754 Differential Revision: https://reviews.llvm.org/D57234 llvm-svn: 352210	2019-01-25 15:58:28 +00:00
Sanjay Patel	21aa6ddc14	[x86] narrow a shuffle that doesn't use or set any high elements This isn't the final fix for our reduction/horizontal codegen, but it takes care of a lot of the problems. After we narrow the shuffle, existing combines for insert/extract and binops kick in, and we end up with cheaper 128-bit ops. The avg and mul reduction tests show an existing shuffle lowering hole for AVX2/AVX512. I think in its most minimal form this is: https://bugs.llvm.org/show_bug.cgi?id=40434 ...but we might need multiple fixes to get it right. Differential Revision: https://reviews.llvm.org/D57156 llvm-svn: 352209	2019-01-25 15:37:42 +00:00
Simon Pilgrim	dea6174b0b	Fix gcc -Wparentheses warning. NFCI. llvm-svn: 352193	2019-01-25 11:38:40 +00:00
Diana Picus	8976ad12a9	[ARM GlobalISel] Support shifts for Thumb2 Same as ARM. On this occasion we split some of the instruction select tests for more complicated instructions into their own files, so we can reuse them for ARM and Thumb mode. Likewise for the legalizer tests. llvm-svn: 352188	2019-01-25 10:48:42 +00:00
Diana Picus	23628c7b05	[ARM GlobalISel] Remove rebase artifact from r351882. NFC r351882 introduced some superfluous calls to mark G_INTTOPTR and G_PTRTOINT as legal (looks like a rebase mishap). Remove them. llvm-svn: 352187	2019-01-25 10:48:35 +00:00
Anton Korobeynikov	509d5c4a7d	[MSP430] Fix absolute addressing mode printing in AsmPrinter Align checks for absolute addressing mode with its current implementation (SR is used as a base register). This fixes https://bugs.llvm.org/show_bug.cgi?id=39993 Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56785 llvm-svn: 352178	2019-01-25 09:14:05 +00:00
Zi Xuan Wu	308a609c6e	[PowerPC] Enhance the fast selection of cmp instruction and clean up related asserts Fast selection of llvm icmp and fcmp instructions is not handled well about VSX instruction support. We'd use VSX float comparison instruction instead of non-vsx float comparison instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is opened. If the target does not have corresponding VSX instruction comparison for some type, just copy VSX-related register to common float register class and use non-vsx comparison instruction. Differential Revision: https://reviews.llvm.org/D57078 llvm-svn: 352174	2019-01-25 07:24:59 +00:00
Craig Topper	6fd9af587a	[X86] Add non-masked versions of vpconflict intrinsics so we can use a select in the header file in clang. I'll remove and autoupgrade the old intrinsics in a future commit. llvm-svn: 352172	2019-01-25 07:08:07 +00:00
Alex Bradbury	456d3798d6	[RISCV] Custom-legalise i32 SDIV/UDIV/UREM on RV64M Follow the same custom legalisation strategy as used in D57085 for variable-length shifts (see that patch summary for more discussion). Although we may lose out on some late-stage DAG combines, I think this custom legalisation strategy is ultimately easier to reason about. There are some codegen changes in rv64m-exhaustive-w-insts.ll but they are all neutral in terms of the number of instructions. Differential Revision: https://reviews.llvm.org/D57096 llvm-svn: 352171	2019-01-25 05:11:34 +00:00
Alex Bradbury	299d690a50	[RISCV] Custom-legalise 32-bit variable shifts on RV64 The previous DAG combiner-based approach had an issue with infinite loops between the target-dependent and target-independent combiner logic (see PR40333). Although this was worked around in rL351806, the combiner-based approach is still potentially brittle and can fail to select the 32-bit shift variant when profitable to do so, as demonstrated in the pr40333.ll test case. This patch instead introduces target-specific SelectionDAG nodes for SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a good example of how this approach can improve codegen. This adds DAG combine that does SimplifyDemandedBits on the operands (only lower 32-bits of first operand and lower 5 bits of second operand are read). This seems better than implementing SimplifyDemandedBitsForTargetNode as there is no guarantee that would be called (and it's not for e.g. the anyext return test cases). Also implements ComputeNumSignBitsForTargetNode. There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new instruction sequences are semantically equivalent. Differential Revision: https://reviews.llvm.org/D57085 llvm-svn: 352169	2019-01-25 05:04:00 +00:00
Matt Arsenault	3b9a82ff2c	AMDGPU/GlobalISel: Remove leftover setAction Also move G_GEP actions together. llvm-svn: 352168	2019-01-25 04:54:00 +00:00
Matt Arsenault	3e08b772b3	AMDGPU/GlobalISel: Scalarize add/sub llvm-svn: 352167	2019-01-25 04:53:57 +00:00
Matt Arsenault	e6cebd0d69	GlobalISel: fewerElementsVector for more cast types llvm-svn: 352166	2019-01-25 04:37:33 +00:00
Matt Arsenault	95fd95cfe0	GlobalISel: fewerElementsVector for a few more trivial ops llvm-svn: 352165	2019-01-25 04:03:38 +00:00
Matt Arsenault	5d622fbcc1	AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mul llvm-svn: 352162	2019-01-25 03:23:04 +00:00
Matt Arsenault	1b1e685f10	GlobalISel: Support fewerElementsVector for icmp/fcmp Also legalize 64-bit compares for AMDGPU llvm-svn: 352157	2019-01-25 02:59:34 +00:00
Matt Arsenault	ca676343a9	GlobalISel: Implement fewerElementsVector for extensions llvm-svn: 352155	2019-01-25 02:36:32 +00:00
Matt Arsenault	990f507704	GlobalISel: Add convenience mutatations to scalarize llvm-svn: 352143	2019-01-25 00:51:00 +00:00
Benjamin Kramer	653020d3cc	[GlobalISel][AArch64] Avoid unused variable warning for variable only used in assert llvm-svn: 352133	2019-01-24 23:45:07 +00:00
Nemanja Ivanovic	b9b75de0ae	[PowerPC] Exploit store instructions that store a single vector element This patch exploits the instructions that store a single element from a vector to preform a (store (extract_elt)). We already have code that does this with ISA 3.0 instructions that were added to handle i8/i16 types. However, we had never exploited the existing ones that handle f32/f64/i32/i64 types. Differential revision: https://reviews.llvm.org/D56175 llvm-svn: 352131	2019-01-24 23:44:28 +00:00
Benjamin Kramer	1411ecf08b	[GlobalISel][AArch64] Avoid unused function warnings in Release builds llvm-svn: 352129	2019-01-24 23:39:47 +00:00
Sanjay Patel	4c304b2923	[x86] move half-size shuffle mask creation to helper; NFC As noted in D57156, we want to check at least part of this pattern earlier (in combining), so this will allow the code to be shared instead of duplicated. llvm-svn: 352127	2019-01-24 23:12:36 +00:00
Jessica Paquette	76c40f827d	Suppress unused capture warning in CheckCopy Werror bots didn't like the lambda + assert thing in my previous commit. Capture everything to suppress the error. Example failure here: http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/29393 llvm-svn: 352124	2019-01-24 22:51:31 +00:00
Matt Arsenault	baa5d2e69c	RegBankSelect: Support some more complex part mappings llvm-svn: 352123	2019-01-24 22:47:04 +00:00
Jessica Paquette	245047dfe8	[GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil This patch adds support for vector @llvm.ceil intrinsics when full 16 bit floating point support isn't available. To do this, this patch... - Implements basic isel for G_UNMERGE_VALUES - Teaches the legalizer about 16 bit floats - Teaches AArch64RegisterBankInfo to respect floating point registers on G_BUILD_VECTOR and G_UNMERGE_VALUES - Teaches selectCopy about 16-bit floating point vectors It also adds - A legalizer test for the 16-bit vector ceil which verifies that we create a G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported - An instruction selection test which makes sure we lower to G_FCEIL when full fp16 is supported - A test for selecting G_UNMERGE_VALUES And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types work as expected. https://reviews.llvm.org/D56682 llvm-svn: 352113	2019-01-24 22:00:41 +00:00
Sanjay Patel	e524639d72	[x86] rename VectorShuffle -> Shuffle; NFC This wasn't consistent within the file, so made it harder to search. Standardize on the shorter name to save some typing. llvm-svn: 352077	2019-01-24 18:52:12 +00:00
James Y Knight	2c36240a82	Fix emission of _fltused for MSVC. It should be emitted when any floating-point operations (including calls) are present in the object, not just when calls to printf/scanf with floating point args are made. The difference caused by this is very subtle: in static (/MT) builds, on x86-32, in a program that uses floating point but doesn't print it, the default x87 rounding mode may not be set properly upon initialization. This commit also removes the walk of the types pointed to by pointer arguments in calls. (To assist in opaque pointer types migration -- eventually the pointee type won't be available.) That latter implies that it will no longer consider a call like `scanf("%f", &floatvar)` as sufficient to emit _fltused on its own. And without _fltused, `scanf("%f")` will abort with error R6002. This new behavior is unlikely to bite anyone in practice (you'd have to read a float, and do nothing with it!), and also, is consistent with MSVC. Differential Revision: https://reviews.llvm.org/D56548 llvm-svn: 352076	2019-01-24 18:34:00 +00:00
Sanjay Patel	e5a0bcf7b8	[x86] add low/high undef half shuffle mask helpers; NFC This is the most common usage for isUndefInRange, so make the code slightly less duplicated and more readable. llvm-svn: 352063	2019-01-24 17:05:02 +00:00
Nirav Dave	c5cb2bed58	[X86] Add missing isReg() guards in FixupSetCCs pass. llvm-svn: 352051	2019-01-24 15:04:17 +00:00
Simon Pilgrim	47ca8606ba	[TTI] Add generic SADDO/SSUBO costs Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045	2019-01-24 13:36:45 +00:00
Simon Pilgrim	2d1964b90f	[TTI] Add generic UADDO/USUBO costs Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043	2019-01-24 12:10:20 +00:00
Petar Avramovic	79df859685	[MIPS GlobalISel] Select zero extending and sign extending load Select zero extending and sign extending load for MIPS32. Use size from MachineMemOperand to determine number of bytes to load. Differential Revision: https://reviews.llvm.org/D57099 llvm-svn: 352038	2019-01-24 10:27:21 +00:00
Petar Avramovic	b5a939d246	[MIPS GlobalISel] Combine extending loads Use CombinerHelper to combine extending load instructions. G_LOAD combined with G_ZEXT, G_SEXT or G_ANYEXT gives G_ZEXTLOAD, G_SEXTLOAD or G_LOAD with same type as def of extending instruction respectively. Similarly G_ZEXTLOAD combined with G_ZEXT gives G_ZEXTLOAD and G_SEXTLOAD combined with G_SEXT gives G_SEXTLOAD with same type as def of extending instruction. Differential Revision: https://reviews.llvm.org/D56914 llvm-svn: 352037	2019-01-24 10:09:52 +00:00
Simon Atanasyan	b6d3c50a36	Reapply: [mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag This reapplies commit r351987 with a failed test fix. Now the test accepts both DW_OP_GNU_push_tls_address and DW_OP_form_tls_address opcode. Original commit message: ``` This is a fix for a regression introduced by the rL348194 commit. In that change new type (MEK_DTPREL) of MipsMCExpr expression was added, but in some places of the code this type of expression considered as unexpected. This change fixes the bug. The MEK_DTPREL type of expression is used for marking TLS DIEExpr only and contains a regular sub-expression. Where we need to handle the expression, we retrieve the sub-expression and handle it in a common way. ``` llvm-svn: 352034	2019-01-24 09:13:14 +00:00
Jonas Paulsson	5916dea338	[SystemZ] Remember to reset the NoPHIs property on MF in createPHIsForSelects() After creating new PHI instructions during isel pseudo expansion, the NoPHIs property of MF should be reset in case it was previously set. Review: Ulrich Weigand llvm-svn: 352030	2019-01-24 07:54:41 +00:00
Craig Topper	e79b779fbb	[X86] Add test cases for opportunities to fold a truncate and a masked store into a truncating masked store. llvm-svn: 352027	2019-01-24 06:15:03 +00:00
Ana Pazos	5c0521ac52	Revert "[RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUI" This reverts commit ccfb060ecb5d7e18ea729455660484d576bde2cc. Some tests need to to fixed before reapplying this commit. llvm-svn: 352014	2019-01-24 03:00:26 +00:00
Ana Pazos	c54abc520c	[RISCV] Set isAsCheapAsAMove for ADDI, ORI, XORI, LUI Summary: Affected instructions: PseudoLI simplest form (ADDI with X0) ALU operations with immediate (they do not set status flag - ADDI, ORI, XORI) Reviewers: asb Reviewed By: asb Subscribers: shiva0217, rkruppe, kito-cheng, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei Differential Revision: https://reviews.llvm.org/D56526 llvm-svn: 352010	2019-01-24 02:41:40 +00:00
Ana Pazos	29ace0e62c	[RISCV] Set isReMaterializable for ORI, XORI Reviewers: asb Reviewed By: asb Subscribers: asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei Differential Revision: https://reviews.llvm.org/D57069 llvm-svn: 352008	2019-01-24 02:31:23 +00:00
Amara Emerson	addb7ab2ae	Revert "[mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag" This reverts commit r351987 as it broke some bots. llvm-svn: 351998	2019-01-24 00:24:59 +00:00
Simon Atanasyan	812f1c55b1	[mips] Handle MipsMCExpr sub-expression for the MEK_DTPREL tag This is a fix for a regression introduced by the rL348194 commit. In that change new type (MEK_DTPREL) of MipsMCExpr expression was added, but in some places of the code this type of expression considered as unexpected. This change fixes the bug. The MEK_DTPREL type of expression is used for marking TLS DIEExpr only and contains a regular sub-expression. Where we need to handle the expression, we retrieve the sub-expression and handle it in a common way. llvm-svn: 351987	2019-01-23 22:02:53 +00:00
Reid Kleckner	f9ebacfd29	Revert r351938 "[ARM] Alter the register allocation order for minsize on Thumb2" This change caused fatal backend errors when compiling a file in libvpx for Android. llvm-svn: 351979	2019-01-23 21:10:48 +00:00
Alexey Bataev	897129dc3f	[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target. Enable full support for the debug info. Differential revision: https://reviews.llvm.org/D46189 llvm-svn: 351974	2019-01-23 18:59:54 +00:00
Alexey Bataev	25624e2e5b	Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target." This reverts commit r351972. Some pieces of the patch was not applied correctly. llvm-svn: 351973	2019-01-23 18:48:36 +00:00
Alexey Bataev	fe0b356063	[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target. Enable full support for the debug info. Recommit to fix the emission of the not required closing brace. Differential revision: https://reviews.llvm.org/D46189 llvm-svn: 351972	2019-01-23 18:28:59 +00:00
Haojian Wu	15a77418a9	Revert "[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target." This reverts commit r351846. This patch may generate illegal assembly code, see ``` $ ./bin/clang -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-grtev4-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name new.cc -mrelocation-model pic -pic-level 2 -mthread-model posix -fmerge-all-constants -mdisable-fp-elim -relaxed-aliasing -no-integrated-as -mpie-copy-relocations -munwind-tables -fcuda-is-device -target-feature +ptx60 -target-cpu sm_35 -dwarf-column-info -debug-info-kind=line-directives-only -dwarf-version=2 -debugger-tuning=gdb -o empty.s -x cuda empty.cc $ cat empty.s // // Generated by LLVM NVPTX Back-End // .version 6.0 .target sm_35 .address_size 64 } ``` llvm-svn: 351966	2019-01-23 16:39:57 +00:00
Andrea Di Biagio	d768d35515	[MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965	2019-01-23 16:35:07 +00:00
Krzysztof Parzyszek	036715408a	[Hexagon] Remove incorrect bit negation llvm-svn: 351956	2019-01-23 15:36:33 +00:00
Benjamin Kramer	4ebed81fc4	[AArch64] Fix out of bounds strlen CFIInst is not zero-terminated. This is one of more annoying functional differences between StringRef and ArrayRef. Found by asan. llvm-svn: 351955	2019-01-23 14:51:21 +00:00
Tim Renouf	f64f8efe13	[AMDGPU] With XNACK, cannot clause a load with result coalesced with operand Summary: With XNACK, an smem load whose result is coalesced with an operand (thus it overwrites its own operand) cannot appear in a clause, because some other instruction might XNACK and restart the whole clause. The clause breaker already realized that an smem that overwrites an operand cannot appear in a clause, and broke the clause. The problem that this commit fixes is that the SIFormMemoryClauses optimization formed a bundle with early clobber, which caused the earlier code that set up the coalesced operand to be removed as dead. Differential Revision: https://reviews.llvm.org/D57008 Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016 llvm-svn: 351950	2019-01-23 13:38:06 +00:00
David Green	6a858a9425	[ARM] Alter the register allocation order for minsize on Thumb2 Currently in Arm code, we allocate LR first, under the assumption that it needs to be saved anyway. Unfortunately this has the disadvantage that it will require any instructions using it to be the longer thumb2 instructions, not the shorter thumb1 ones. This switches the order when we are optimising for minsize, returning to the default order so that more lower registers can be used. It can end up requiring more pushed registers, but on average produces smaller code. Differential Revision: https://reviews.llvm.org/D56008 llvm-svn: 351938	2019-01-23 10:18:30 +00:00
Sam Parker	31bef63bb4	[ARM][CGP] Check trunc type before replacing In the last stage of type promotion, we replace any zext that uses a new trunc with the operand of the trunc. This is okay when we only allowed one type to be optimised, but now its the case that the trunc maybe needed to produce a more narrow type than the one we were optimising for. So we need to check this before doing the replacement. Differential Revision: https://reviews.llvm.org/D57041 llvm-svn: 351935	2019-01-23 09:18:44 +00:00
Kristof Beyls	3ff5dfd735	[SLH] AArch64: correctly pick temporary register to mask SP As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930	2019-01-23 08:18:39 +00:00
Jonas Paulsson	961c47ec98	[SystemZ] Handle DBG_VALUE instructions in two places in backend. Two backend optimizations failed to handle cases when compiled with -g, due to failing to consider DBG_VALUE instructions. This was in SystemZTargetLowering::emitSelect() and SystemZElimCompare::getRegReferences(). This patch makes sure that DBG_VALUEs are recognized so that they do not affect these optimizations. Tests for branch-on-count, load-and-trap and consecutive selects. Review: Ulrich Weigand https://reviews.llvm.org/D57048 llvm-svn: 351928	2019-01-23 07:42:26 +00:00
Peter Collingbourne	73078ecd38	hwasan: Move memory access checks into small outlined functions on aarch64. Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920	2019-01-23 02:20:10 +00:00
Ana Pazos	5f51e09c7b	Fixed isReMaterializable setting for LUI instruction. llvm-svn: 351895	2019-01-22 22:59:47 +00:00
Matt Arsenault	4c5e8f51e7	AMDGPU/GlobalISel: Start selectively legalizing 16-bit operations It might be a bit nicer to use the fancy .legalIf and co. predicates, but this was requiring more boilerplate and disables the coverage assertions. llvm-svn: 351886	2019-01-22 22:00:19 +00:00
Matt Arsenault	736cfa9ffb	AMDGPU/GlobalISel: Handle legality/regbanks for 32/64-bit shifts llvm-svn: 351884	2019-01-22 21:51:38 +00:00
Matt Arsenault	30989e492b	GlobalISel: Allow shift amount to be a different type For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882	2019-01-22 21:42:11 +00:00
Matt Arsenault	6378629609	GlobalISel: Implement widen for extract_vector_elt elt type llvm-svn: 351871	2019-01-22 20:38:15 +00:00
Matt Arsenault	aebb2ee036	GlobalISel: Implement fewerElementsVector for basic FP ops llvm-svn: 351866	2019-01-22 20:14:29 +00:00
Matt Arsenault	41a8bee93b	AMDGPU/GlobalISel: Remove vectors from legal constant types llvm-svn: 351859	2019-01-22 19:04:51 +00:00
Matt Arsenault	6614f852b6	GlobalISel: Support narrowing zextload/sextload llvm-svn: 351856	2019-01-22 19:02:10 +00:00
Matt Arsenault	a5840c3c39	Codegen support for atomicrmw fadd/fsub llvm-svn: 351851	2019-01-22 18:36:06 +00:00
Matt Arsenault	39508331ef	Reapply "IR: Add fp operations to atomicrmw" This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850	2019-01-22 18:18:02 +00:00
Alexey Bataev	4e9db1beff	[DEBUGINFO, NVPTX] Enable support for the debug info on NVPTX target. Summary: Enable full support for the debug info. Reviewers: echristo Subscribers: jholewinski, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D46189 llvm-svn: 351846	2019-01-22 17:43:37 +00:00
Alexey Bataev	9d5974a9fc	[DEBUG_INFO, NVPTX] Fix relocation info. Summary: Initial function labels must follow the debug location for the correct relocation info generation. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45784 llvm-svn: 351843	2019-01-22 17:24:16 +00:00
Alex Bradbury	0a9c9a8daa	[RISCV][NFC] Change naming scheme for RISC-V specific DAG nodes Previously we had names like 'Call' or 'Tail'. This potentially clashes with the naming scheme used elsewhere in RISCVInstrInfo.td. Many other backends would use names like AArch64call or PPCtail. I prefer the SystemZ approach, which uses prefixed all-lowercase names. This matches the naming scheme used for target-independent SelectionDAG nodes. llvm-svn: 351823	2019-01-22 14:05:11 +00:00
Simon Pilgrim	933673d878	[X86][SSE] Canonicalize OR(AND(X,C),AND(Y,~C)) -> OR(AND(X,C),ANDNP(C,Y)) For constant bit select patterns, replace one AND with a ANDNP, allowing us to reuse the constant mask. Only do this if the mask has multiple uses (to avoid losing load folding) or if we have XOP as its VPCMOV can handle most folding commutations. This also requires computeKnownBitsForTargetNode support for X86ISD::ANDNP and X86ISD::FOR to prevent regressions in fabs/fcopysign patterns. Differential Revision: https://reviews.llvm.org/D55935 llvm-svn: 351819	2019-01-22 13:44:49 +00:00
Simon Pilgrim	aa6a4339ac	[X86][BtVer2] SSE2 vector shifts has local forwarding disabled Similar to horizontal ops on D56777, the sse2 (but not mmx) bit shift ops has local forwarding disabled, adding +1cy to the use latency for the result. Differential Revision: https://reviews.llvm.org/D57026 llvm-svn: 351817	2019-01-22 13:27:18 +00:00
Simon Pilgrim	9e2c2cfcd9	Fix "comparison of unsigned expression >= 0 is always true" warning. NFCI. llvm-svn: 351816	2019-01-22 13:18:26 +00:00
Simon Pilgrim	2c69f90171	[X86][BtVer2] X86ISD::VPERMILPV has local forwarding disabled Similar to horizontal ops on D56777, the vpermilpd/vpermilps variable mask ops has local forwarding disabled, adding +1cy to the use latency for the result. Differential Revision: https://reviews.llvm.org/D57022 llvm-svn: 351815	2019-01-22 13:13:57 +00:00
Simon Pilgrim	ee900efb30	[CostModel][X86] Add ICMP Predicate specific costs First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810	2019-01-22 12:29:38 +00:00
Simon Pilgrim	180fcff5a7	[X86][SSE] Add selective commutation support for insertps (PR40340) When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding. Differential Revision: https://reviews.llvm.org/D56843 llvm-svn: 351807	2019-01-22 12:17:48 +00:00
Alex Bradbury	cd26560e46	[RISCV] Quick fix for PR40333 Avoid the infinite loop caused by the target DAG combine converting ANYEXT to SIGNEXT and the target-independent DAG combine logic converting back to ANYEXT. Do this by not adding the new node to the worklist. Committing directly as this definitely doesn't make the problem any worse, and I intend to follow-up with a patch that avoids this custom combiner logic altogether and just lowers the i32 operations to a target-specific SelectionDAG node. This should be easier to reason about and improve codegen quality in some cases (though may miss out on some later DAG combines). llvm-svn: 351806	2019-01-22 12:11:53 +00:00
Simon Pilgrim	3adf50b2c0	[X86] HADDPS/HADDPD scalar lowering was added at rL350421 llvm-svn: 351797	2019-01-22 10:49:41 +00:00
Chandler Carruth	285fe716c5	Revert r351778: IR: Add fp operations to atomicrmw This broke the RISCV build, and even with that fixed, one of the RISCV tests behaves surprisingly differently with asserts than without, leaving there no clear test pattern to use. Generally it seems bad for hte IR to differ substantially due to asserts (as in, an alloca is used with asserts that isn't needed without!) and nothing I did simply would fix it so I'm reverting back to green. This also required reverting the RISCV build fix in r351782. llvm-svn: 351796	2019-01-22 10:29:58 +00:00
Alex Bradbury	1b9cd446f7	[RISCV][NFC] Add break to case statement in RISCVDAGToDAGISel::Select The break isn't strictly needed yet as there is no subsequent entry in the case. But adding to prevent mistakes further down the road. llvm-svn: 351785	2019-01-22 07:22:00 +00:00
Alex Bradbury	b96b755c4d	[RISCV] Fix build after r351778 Also add a comment to explain the expansion strategy for atomicrmw {fadd,fsub}. llvm-svn: 351782	2019-01-22 05:06:57 +00:00
Matt Arsenault	bfdba5e4fc	IR: Add fp operations to atomicrmw Add just fadd/fsub for now. llvm-svn: 351778	2019-01-22 03:32:36 +00:00
Eli Friedman	1eaa04d682	[ARM] Combine ands+lsls to lsls+lsrs for Thumb1. This patch may seem familiar... but my previous patch handled the equivalent lsls+and, not this case. Usually instcombine puts the "and" after the shift, so this case doesn't come up. However, if the shift comes out of a GEP, it won't get canonicalized by instcombine, and DAGCombine doesn't have an equivalent transform. This also modifies isDesirableToCommuteWithShift to suppress DAGCombine transforms which would make the overall code worse. I'm not really happy adding a bunch of code to handle this, but it would probably be tricky to substantially improve the behavior of DAGCombine here. Differential Revision: https://reviews.llvm.org/D56032 llvm-svn: 351776	2019-01-22 01:51:37 +00:00
Eli Friedman	23e60c7893	[AArch64] Add patterns for zext/sext of shift amount. Not sure this is the best fix, but it saves an instruction for certain constructs involving variable shifts. Differential Revision: https://reviews.llvm.org/D55572 llvm-svn: 351768	2019-01-22 00:21:35 +00:00
Matt Arsenault	fb67164ebc	AMDGPU/GlobalISel: Legalize more fp<->int conversions llvm-svn: 351767	2019-01-22 00:20:17 +00:00
Craig Topper	bcbdf61078	[X86] Use X86ISD::VFPROUND instead of ISD::FP_ROUND for 256 and 512 bit cvtpd2ps intrinsics. Summary: Use X86ISD::VFPROUND in the instruction isel patterns. Add new patterns for ISD::FP_ROUND to maintain support for fptrunc in IR. In the process I found a couple duplicate isel patterns which I also deleted in this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56991 llvm-svn: 351762	2019-01-21 20:14:09 +00:00
Craig Topper	c2087d8f3f	[X86] Change avx512 COMPRESS and EXPAND lowering to use a single masked node instead of expand/compress+select. Summary: For compress, a select node doesn't semantically reflect the behavior of the instruction. The mask would have holes in it, but the resulting write is to contiguous elements at the bottom of the vector. Furthermore, as far as the compressing and expanding is concerned the behavior is depended on the mask. You can't just have an expand/compress node that only reads the input vector. That node would have no meaning by itself. This all only works because we pattern match the compress/expand+select back to the instruction. But conceivably an optimization of the select could break the pattern and leave something meaningless. This patch modifies the expand and compress node to take the mask and passthru as additional inputs and gets rid of the select all together. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57002 llvm-svn: 351761	2019-01-21 20:02:28 +00:00
Stanislav Mekhanoshin	f92ed6966e	[AMDGPU] Fixed hazard recognizer to walk predecessors Fixes two problems with GCNHazardRecognizer: 1. It only scans up to 5 instructions emitted earlier. 2. It does not take control flow into account. An earlier instruction from the previous basic block is not necessarily a predecessor. At the same time a real predecessor block is not scanned. The patch provides a way to distinguish between scheduler and hazard recognizer mode. It is OK to work with emitted instructions in the scheduler because we do not really know what will be emitted later and its order. However, when pass works as a hazard recognizer the schedule is already finalized, and we have full access to the instructions for the whole function, so we can properly traverse predecessors and their instructions. Differential Revision: https://reviews.llvm.org/D56923 llvm-svn: 351759	2019-01-21 19:11:26 +00:00
Simon Pilgrim	9b73ae96c5	[X86][BtVer2] Update latency of mmx horizontal operations D56777 added +1cy local forwarding penalty for horizontal operations, but this penalty only affects sse2/xmm variants, the mmx variants don't suffer the penalty. Confirmed with @andreadb llvm-svn: 351755	2019-01-21 18:04:25 +00:00
Andrea Di Biagio	b68dd05c14	[X86][BtVer2] Update the WriteLoad latency. r327630 introduced new write definitions for float/vector loads. Before that revision, WriteLoad was used by both integer/float (scalar/vector) load. So, WriteLoad had to conservatively declare a latency to 5cy. That is because the load-to-use latency for float/vector load is 5cy. Now that we have dedicated writes for float/vector loads, there is no reason why we should keep the latency of WriteLoad to 5cy. At the moment, WriteLoad is only used by scalar integer loads only; we can assume an optimstic 3cy latency for them. This patch changes that latency from 5cy to 3cy, and regenerates the affected scheduling/mca tests. Differential Revision: https://reviews.llvm.org/D56922 llvm-svn: 351742	2019-01-21 12:04:10 +00:00
Craig Topper	f608dc1f57	[X86] Remove and autoupgrade vpmovqd/vpmovwb intrinsics using trunc+select. llvm-svn: 351729	2019-01-21 08:16:59 +00:00
Kito Cheng	5e8798f987	[RISCV] Add R_RISCV_RELAX relocation to all possible relax candidates. Summary: Add R_RISCV_RELAX relocation to all possible relax candidates and update corresponding testcase. Reviewers: asb, apazos Differential Revision: https://reviews.llvm.org/D46677 llvm-svn: 351723	2019-01-21 05:27:09 +00:00
Dylan McKay	5c23410fdf	[AVR] Insert unconditional branch when inserting MBBs between blocks with fallthrough This updates the AVR Select8/Select16 expansion code so that, when inserting the two basic blocks for true and false conditions, any existing fallthrough on the previous block is preserved. Prior to this patch, if the block before the Select pseudo fell through to the subsequent block, two new basic blocks would be inserted at the prior fallthrough point, changing the fallthrough destination. The predecessor or successor lists were not updated, causing the BranchFolding pass at -O1 and above the rearrange basic blocks, causing an infinite loop. Not to mention the unconditional fallthrough to the true block is incorrect in of itself. This patch modifies the Select8/16 expansion so that, if inserting true and false basic blocks at a fallthrough point, the implicit branch is preserved by means of an explicit, unconditional branch to the previous fallthrough destination. Thanks to Carl Peto for reporting this bug. This fixes avr-rust bug https://github.com/avr-rust/rust/issues/123. llvm-svn: 351721	2019-01-21 04:32:02 +00:00
Dylan McKay	f15cc113a5	[AVR] Enable emission of debug information Prior to this, the code was missing AVR-specific relocation logic in RelocVisitor.h. This patch teaches RelocVisitor about R_AVR_16 and R_AVR_32. Debug information is emitted in the final object file, and understood by 'avr-readelf --debug-dump' from AVR-GCC. llvm-dwarfdump is yet to understand how to dump AVR DWARF symbols. llvm-svn: 351720	2019-01-21 04:27:08 +00:00
Dylan McKay	ce0ab06353	Revert "[AVR] Insert unconditional branch when inserting MBBs between blocks with fallthrough" This reverts commit r351718. Carl pointed out that the unit test could be improved. This patch will be recommitted once the test is made more resilient. llvm-svn: 351719	2019-01-21 02:46:13 +00:00

1 2 3 4 5 ...

50751 Commits