llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	ede97c9548	[X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table. llvm-svn: 334511	2018-06-12 15:48:03 +00:00
Krzysztof Parzyszek	bea23d065e	[Hexagon] Make floating point operations expensive for vectorization llvm-svn: 334508	2018-06-12 15:12:50 +00:00
Sanjay Patel	c3466d2568	[x86] move shrunkblend transform to helper function; NFCI We should be able to obsolete D48043 by easing the constraints on this existing code. llvm-svn: 334504	2018-06-12 14:21:51 +00:00
Krzysztof Parzyszek	3d671248ab	[SelectionDAG] Provide default expansion for rotates Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497	2018-06-12 12:49:36 +00:00
Simon Dardis	74fb5e6789	[mips] Guard some floating point instructions correctly Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47636 llvm-svn: 334491	2018-06-12 10:28:06 +00:00
Aleksandar Beserminji	8acdc10220	[mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with additional flag, so instead of always lowering to lui %hi(...), addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi, %higher or %highest depending on the added flag. Differential Revision: https://reviews.llvm.org/D47941 llvm-svn: 334490	2018-06-12 10:23:49 +00:00
Luke Geeson	dc82aa44e6	[AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns llvm-svn: 334488	2018-06-12 09:35:20 +00:00
Craig Topper	88c230265b	[X86] Add NotMemoryFoldable to the VPCOMPRESS instructions. llvm-svn: 334481	2018-06-12 07:32:19 +00:00
Craig Topper	5799e4df75	[X86] Add NotMemoryFoldable to more instructions. These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form. llvm-svn: 334479	2018-06-12 07:32:17 +00:00
Craig Topper	66572df76e	[X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from the autogenerated load folding table. Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table. A few are real opcode collisions where the memory and register forms are completely different instructions. llvm-svn: 334474	2018-06-12 04:34:59 +00:00
Craig Topper	957b738432	[X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc. We were missing packed isel folding patterns for all of sse41, avx, and avx512. For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns. Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch. Some of this was spotted in the review for D47993. This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns. llvm-svn: 334460	2018-06-12 00:48:57 +00:00
Mark Searles	987f292c56	[AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"' The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459	2018-06-12 00:41:26 +00:00
George Burgess IV	c72204d5b5	Simplify; NFC Not shown in the diff: AQ is a `vector<SUnit >`, and SU is a `SUnit ` llvm-svn: 334451	2018-06-11 22:58:32 +00:00
Konstantin Zhuravlyov	3e5d66ac66	AMDGPU: Add 64-bit relative variant kind Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443	2018-06-11 21:37:57 +00:00
Craig Topper	3efdb7ce19	[X86] Push some variable declarations down into the individual switch cases that need them. NFC All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope. llvm-svn: 334436	2018-06-11 20:50:58 +00:00
Craig Topper	ceed99baf0	[X86] Reorder some type constraints to force things to be vectors and integer/fp before forcing them to be the same size. This may be needed by another patch that I'm working on. It should have no effect on any of the generated outputs. llvm-svn: 334430	2018-06-11 19:20:15 +00:00
Krzysztof Parzyszek	dd9415d550	[Hexagon] Late predicate producers cannot be used as dot-new sources llvm-svn: 334426	2018-06-11 18:45:52 +00:00
Simon Pilgrim	14ee66ef37	[X86][AVX512] Tag AVX5124FMAPS/AVX5124VNNIW with missing scheduler classes Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged..... UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512). llvm-svn: 334423	2018-06-11 17:28:00 +00:00
Stanislav Mekhanoshin	7ba3fc730c	[AMDGPU] Do not consider indirect acces through phi for wave limiter Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420	2018-06-11 16:50:49 +00:00
Aleksandar Beserminji	62cf9d21ab	[mips] Fix spill slot for mips3, n64 abi When program is compiled for mips3 with n64 abi, wrong register class is used for creating an emergency spill slot. This patch fixes the correct register class to be chosen. This patch resolves PR35859. Thanks to John Baldwin for reporting the issue! Differential Revision: https://reviews.llvm.org/D47938 llvm-svn: 334419	2018-06-11 16:50:28 +00:00
Dylan McKay	d011869c82	[AVR] Set trackLivenessAfterRegAlloc This sets trackLivenessAfterRegAlloc on AVRRegisterInfo. Most existing targets set this flag. Without it, specific IR inputs cause LLVM to fail with: Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) && "Liveness information is accurate"), function livein_begin file MachineBasicBlock.cpp, line 1354. With this commit, this no longer happens. Patch by Peter Nimmervoll. llvm-svn: 334409	2018-06-11 14:46:48 +00:00
Clement Courbet	7db69cc08a	[X86] Fix skylake server scheduling info. Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407	2018-06-11 14:37:53 +00:00
Clement Courbet	f4f6899cdf	[ExynosM1][Sched] Fix resource usage in scheduling model. This is part of https://reviews.llvm.org/D46356. llvm-svn: 334391	2018-06-11 07:33:08 +00:00
Clement Courbet	c48435bfe5	[X86] Explicitly mark unsupported classes in scheduling models. Summary: In preparation for D47721. HSW and SNB still define unsupported classes as they are used by KNL and generic models respectively. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47763 llvm-svn: 334389	2018-06-11 07:00:08 +00:00
Craig Topper	0e25c8239a	[X86] Remove masking from dbpsadbw intrinsics, use select in IR instead. llvm-svn: 334384	2018-06-11 06:18:22 +00:00
Daniel Cederman	33f67a256b	[Sparc] Add support for 13-bit PIC Summary: When compiling with -fpic, in contrast to -fPIC, use only the immediate field to index into the GOT. This saves space if the GOT is known to be small. The linker will warn if the GOT is too large for this method. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D47136 llvm-svn: 334383	2018-06-11 05:50:08 +00:00
Craig Topper	e71ad1f6d0	[X86] Remove and autoupgrade the expandload and compressstore intrinsics. We use the target independent intrinsics now. llvm-svn: 334381	2018-06-11 01:25:22 +00:00
Craig Topper	860562c915	[X86] Miscellaneous fixes to get the load folding table generator to work again. llvm-svn: 334377	2018-06-10 21:48:24 +00:00
Ivan A. Kosarev	847daa11f8	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361	2018-06-10 09:27:27 +00:00
Craig Topper	98a79934af	[X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead. llvm-svn: 334358	2018-06-10 06:01:36 +00:00
Gabor Buella	5aa26980c4	[X86] NFC Use member initialization in X86Subtarget The separate initializeEnvironment function was sort of useless since r217071. ARM did this move already with r273556. llvm-svn: 334345	2018-06-09 09:19:40 +00:00
Eli Friedman	864df22307	[ARM] Allow CMPZ transforms even if the input has multiple uses. It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322	2018-06-08 21:16:56 +00:00
Simon Pilgrim	5c32989c91	[X86][SSE] Support v8i16/v16i16 rotations Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together. Differential Revision: https://reviews.llvm.org/D47822 llvm-svn: 334309	2018-06-08 17:58:42 +00:00
Simon Pilgrim	89deac6694	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303	2018-06-08 17:00:45 +00:00
Daniil Fukalov	c9a098b314	[AMDGPU] Inline asm - added i16, half and i128 types support AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301	2018-06-08 16:29:04 +00:00
Simon Pilgrim	a6afa310c9	[X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code. This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting). I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking. llvm-svn: 334289	2018-06-08 13:59:11 +00:00
Simon Dardis	1d6254f7e9	[mips] Correct the predicates for a number of codegen only instructions Reviewers: smaksimovic, atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D47638 llvm-svn: 334280	2018-06-08 10:55:34 +00:00
Alex Bradbury	ed53ca73ec	[RISCV] Implement MC layer support for the fence.tso instruction The instruction makes use of a previously ignored field in the fence instruction. It is introduced in the version 2.3 draft of the RISC-V specification after much work by the Memory Model Task Group. As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>, the fence.tso assembler mnemonic does not have operands. llvm-svn: 334278	2018-06-08 10:39:05 +00:00
Simon Pilgrim	ad45efc445	[X86][SSE] Consistently prefer lowering to PACKUS over PACKSS We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS. PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines. The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets. llvm-svn: 334276	2018-06-08 10:29:00 +00:00
Matt Arsenault	6fc3759811	AMDGPU: Error on LDS global address in functions These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269	2018-06-08 08:05:54 +00:00
Hiroshi Inoue	863fb7a4b8	[NFC] fix formatting llvm-svn: 334263	2018-06-08 04:00:54 +00:00
Craig Topper	2ed4328281	[X86] Improve some shuffle decoding code to remove a conditional from a loop and reduce the number of temporary variables. NFCI The NumControlBits variable was definitely sketchy. I think that only worked because the expected value was 1 or 2 and the number of lanes was 2 or 4. Had their been 8 lanes the number of bits should have been 3 not 4 as the previous code would have given. llvm-svn: 334258	2018-06-08 01:09:31 +00:00
Tony Tye	6db1f5da4f	[AMDGPU] Simplify memory legalizer (add missing virtual descructor) Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334257	2018-06-08 01:00:11 +00:00
Tony Tye	a5a7c331e7	[AMDGPU] Simplify memory legalizer - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334241	2018-06-07 22:28:32 +00:00
Simon Pilgrim	51ceef775b	[X86][SSE] Updated comment - combineVectorSignBitsTruncation handles PACKSS and PACKUS. NFCI. llvm-svn: 334204	2018-06-07 16:08:40 +00:00
Alex Bradbury	6a4b5441e4	[RISCV] AsmParser support for the li pseudo instruction The implementation follows the MIPS backend and expands the pseudo instruction directly during asm parsing. As the result, only real MC instructions are emitted to the MCStreamer. The actual expansion to real instructions is similar to the expansion performed by the GNU Assembler. This patch supersedes D41949. Differential Revision: https://reviews.llvm.org/D46118 Patch by Mario Werner. llvm-svn: 334203	2018-06-07 15:35:47 +00:00
Alex Bradbury	6cfb31c7c1	[AVR] Fix build after r334078 r334078 added MCSubtargetInfo to fixupNeedsRelaxation and applyFixup. This patch makes the necessary adjustment for the AVR target. llvm-svn: 334202	2018-06-07 15:29:09 +00:00
Simon Pilgrim	51ff15f472	[X86][SSE] Simplify combineVectorTruncationWithPACKUS. NFCI. Move code only used by combineVectorTruncationWithPACKUS out of combineVectorTruncation. llvm-svn: 334201	2018-06-07 14:53:32 +00:00
Hiroshi Inoue	01ef4c2c64	[PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector BitPermutationSelector sets Repl32 flag for bit groups which can be (potentially) benefit from 32-bit rotate-and-mask instructions with bit replication, i.e. rlwinm/rlwimi copies lower 32 bits into upper 32 bits on 64-bit PowerPC before rotation. However, enforcing 32-bit instruction sometimes results in redundant generated code. For example, the following simple code is compiled into rotldi + rlwimi while it can be compiled into only rldimi instruction if Repl32 flag is not set on the bit group for (a & 0xFFFFFFFF). uint64_t func(uint64_t a, uint64_t b) { return (a & 0xFFFFFFFF) \| (b << 32) ; } To avoid such problem, this patch checks the potential benefit of Repl32 flag before setting it. If a bit group does not require rotation (i.e. RLAmt == 0) and won't be merged into another group, we do not benefit from Repl32 flag on this group. Differential Revision: https://reviews.llvm.org/D47867 llvm-svn: 334195	2018-06-07 13:21:14 +00:00
Petar Jovanovic	241f286bd7	[Mips] Silencing warnings in instruction info (NFC) isORCopyInst and isReadOrWriteToDSPReg functions were producing warning that some statements my fall through. Patch by Nikola Prica. Differential Revision: https://reviews.llvm.org/D47876 llvm-svn: 334194	2018-06-07 13:06:06 +00:00

1 2 3 4 5 ...

47879 Commits