llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	4fe321d1ce	[X86] Add SHUF128 to target shuffle decoding. Differential Revision: https://reviews.llvm.org/D48954 llvm-svn: 336376	2018-07-05 17:10:17 +00:00
Simon Pilgrim	8c3765dc6b	[CostModel][X86] Add UDIV/UREM by pow2 costs Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371	2018-07-05 16:56:28 +00:00
Craig Topper	350c5f1881	[X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement. llvm-svn: 336315	2018-07-05 06:52:55 +00:00
Craig Topper	2db909cfae	[X86] Remove some isel patterns for X86ISD::SELECTS that specifically looked for the v1i1 mask to have come from a scalar_to_vector from GR8. We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them. llvm-svn: 336305	2018-07-05 03:01:29 +00:00
Craig Topper	95eb88abfe	[X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg input. Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)). This patch fixes that so we can also combine things like (fmsub (fneg)). llvm-svn: 336304	2018-07-05 02:52:56 +00:00
Craig Topper	e4b9257b69	[X86] Remove some of the packed FMA3 intrinsics since we no longer use them in clang. There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes. More removals to come, but I wanted to stop and fix the regression that showed up in this first. llvm-svn: 336303	2018-07-05 02:52:54 +00:00
Yvan Roux	eaececf5e0	[MachineOutliner] Fix typo in getOutliningCandidateInfo function name getOutlininingCandidateInfo -> getOutliningCandidateInfo Differential Revision: https://reviews.llvm.org/D48867 llvm-svn: 336285	2018-07-04 15:37:08 +00:00
Simon Pilgrim	c3e1617bf9	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED) We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247. llvm-svn: 336250	2018-07-04 09:12:48 +00:00
Fangrui Song	78ab286aa0	[X86][AsmParser] Fix inconsistent declaration parameter name in r336218 llvm-svn: 336232	2018-07-03 21:40:03 +00:00
Craig Topper	e317533dcf	[X86] Remove repeated 'the' from multiple comments that have been copy and pasted. NFC llvm-svn: 336226	2018-07-03 20:39:55 +00:00
Craig Topper	adc51ae425	[X86][AsmParser] Rework the in/out (%dx) hack one more time. This patch adds a new token type specifically for (%dx). We will now always create this token when we parse (%dx). After all operands have been parsed, if the mnemonic is in/out we'll morph this token to a regular register token. Otherwise we keep it as the special DX token which won't match any instructions. This removes the need for passing Mnemonic through the parsing functions. It also seems closer to gas where when its used on the wrong instruction it just gets diagnosed as an invalid operand rather than a bad memory address. llvm-svn: 336218	2018-07-03 18:07:30 +00:00
Craig Topper	bc598f0d61	[X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode. This might make the error message added in r335668 unneeded, but I'm not sure yet. The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit. llvm-svn: 336217	2018-07-03 17:40:51 +00:00
Benjamin Kramer	fd171f2f89	Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values" This reverts commit r336113. It causes crashes. llvm-svn: 336189	2018-07-03 11:15:17 +00:00
Krzysztof Parzyszek	fd97494984	[X86] Add phony registers for high halves of regs with low halves Add registers still missing after r328016 (D43353): - for bits 15-8 of SI, DI, BP, SP (H), and R8-R15 (BH), - for bits 31-16 of R8-R15 (*WH). Thanks to Craig Topper for pointing it out. llvm-svn: 336134	2018-07-02 19:05:09 +00:00
Craig Topper	56440b9745	[X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned. Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions. Should fix PR38001 llvm-svn: 336121	2018-07-02 17:01:54 +00:00
Simon Pilgrim	2bc8e079f2	[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113	2018-07-02 15:14:07 +00:00
Alex Bradbury	c48908781d	[X86] Use addAliasForDirective to support the .word directive (reland) The X86 asm parser currently has custom parsing logic for .word. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon (rL332607) backends. Differential Revision: https://reviews.llvm.org/D47004 This is a fixed reland of rL336100. This should have been caught in pre-commit testing so apologies for the noise. llvm-svn: 336104	2018-07-02 13:49:52 +00:00
Alex Bradbury	c000e4dcb5	Revert r336100 This was a bad change. .word == 2byte on x86. llvm-svn: 336103	2018-07-02 13:43:45 +00:00
Alex Bradbury	42485ec9ca	[X86] Use addAliasForDirective to support the .word directive The X86 asm parser currently has custom parsing logic for .word. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon (rL332607) backends. Differential Revision: https://reviews.llvm.org/D47004 llvm-svn: 336100	2018-07-02 13:37:15 +00:00
Simon Pilgrim	e389434a8a	[X86][BtVer2] Added Jaguar FPU Pipe0/1 uop counters to permit basic llvm-exegesis uop testing We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources. llvm-svn: 336089	2018-07-02 09:15:01 +00:00
Craig Topper	e06dabd3ca	[X86] Put some cases in switch statements back on one line to be more compact and make it easier to see the similarities. NFC It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here. llvm-svn: 336077	2018-07-02 06:42:42 +00:00
Craig Topper	0661f67296	[X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary search. I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier. Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now. llvm-svn: 336075	2018-07-02 06:23:39 +00:00
Craig Topper	c004aa6c5f	[X86] Remove the places that return nullptr from X86InstrInfo::commuteInstructionImpl. findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl. llvm-svn: 336070	2018-07-01 23:27:41 +00:00
Craig Topper	4d8ec92fb0	[X86][Disassembler] Remove TYPE_BNDR from translateImmediate. I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line. llvm-svn: 336066	2018-07-01 17:50:29 +00:00
Craig Topper	a2d30b3134	[X86] Remove unnecessary include. NFC Leftover from when the pass contained a DenseMap before it switched to binary search. llvm-svn: 336057	2018-07-01 05:54:22 +00:00
Craig Topper	4e78213ae4	[X86] Move the memory unfolding table creation into its own class and make it a ManagedStatic. Also move the static folding tables, their search functions and the new class into new cpp/h files. The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables. By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed. llvm-svn: 336056	2018-07-01 05:47:49 +00:00
Craig Topper	84199deb17	[X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a getFMA3Group free function. NFCI The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use. This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic. llvm-svn: 336055	2018-06-30 22:38:42 +00:00
Craig Topper	731740744f	[X86] Remove the AsmName from the HAX,HDX,HCX,HBX,HSI,HDI,HBP,HSP,HIP artificial registers so they can't be parsed by the assembly parser. There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand. llvm-svn: 336054	2018-06-30 22:38:41 +00:00
Craig Topper	1b7b9b8596	[X86] Use MVT::i8 for scalar shift amounts since that is what they ultimately need to legalize to. I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step. In one case it looks like we may have been creating a shift amount larger than the shift input itself. llvm-svn: 336052	2018-06-30 18:30:31 +00:00
Craig Topper	5f28d50d27	[X86] When combining load to BZHI, make sure we create the shift instruction with an i8 type. This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted. So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step. llvm-svn: 336051	2018-06-30 17:49:42 +00:00
Craig Topper	59f2f38fe0	[X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead. llvm-svn: 336035	2018-06-30 01:32:04 +00:00
Craig Topper	87b107dd69	[X86] Limit the number of target specific nodes emitted in LowerShiftParts The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes. The changed test cases still aren't optimal. Differential Revision: https://reviews.llvm.org/D48619 llvm-svn: 335998	2018-06-29 17:24:07 +00:00
Craig Topper	7c96f051d2	[X86] Use a std::vector for the memory unfolding table. Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated. This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups. Differential Revision: https://reviews.llvm.org/D48585 llvm-svn: 335994	2018-06-29 17:11:26 +00:00
Simon Pilgrim	aab8660e23	[X86][SSE] Support v16i8/v32i8 vector rotations This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957	2018-06-29 09:36:39 +00:00
Craig Topper	875e9f8fa4	[X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944	2018-06-29 05:43:26 +00:00
Craig Topper	90317d1d94	[X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895	2018-06-28 17:58:01 +00:00
Jonas Devlieghere	b757fc3878	Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894	2018-06-28 17:56:43 +00:00
Jessica Paquette	dafa198c96	[MachineOutliner] Define MachineOutliner support in TargetOptions Targets should be able to define whether or not they support the outliner without the outliner being added to the pass pipeline. Before this, the outliner pass would be added, and ask the target whether or not it supports the outliner. After this, it's possible to query the target in TargetPassConfig, before the outliner pass is created. This ensures that passing -enable-machine-outliner will not modify the pass pipeline of any target that does not support it. https://reviews.llvm.org/D48683 llvm-svn: 335887	2018-06-28 17:45:43 +00:00
Matthias Braun	da5e7e11d1	SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O) Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877	2018-06-28 17:00:45 +00:00
Hans Wennborg	a257376003	s/TablesChecked/TableChecked/ after r335823 llvm-svn: 335831	2018-06-28 10:24:38 +00:00
Benjamin Kramer	f9613b2995	Unify sorted asserts to use the existing atomic pattern These are all benign races and only visible in !NDEBUG. tsan complains about it, but a simple atomic bool is sufficient to make it happy. llvm-svn: 335823	2018-06-28 10:03:45 +00:00
Craig Topper	ec5d568ac1	[X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded. This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K. I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing. llvm-svn: 335806	2018-06-28 01:45:44 +00:00
Craig Topper	ab70f58891	[X86] Change how we prefer shift by immediate over folding a load into a shift. BMI2 added new shift by register instructions that have the ability to fold a load. Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load. We used to enforce this priority by artificially lowering the complexity of the load pattern. This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky. llvm-svn: 335804	2018-06-28 00:47:41 +00:00
Benjamin Kramer	e214f046af	[X86] Make folding table checking threadsafe This is a benign race, but tsan likes to complain about it. Just make it happy. llvm-svn: 335788	2018-06-27 21:01:53 +00:00
Craig Topper	880e34ed45	[X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it. If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed. Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode. llvm-svn: 335787	2018-06-27 20:58:46 +00:00
Fangrui Song	b0d57a535b	[X86] Fix unmatched parenthesis in r335768 llvm-svn: 335769	2018-06-27 19:12:07 +00:00
Craig Topper	6bea2c7f9b	[X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result. The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte. This should match the behavior of objdump from binutils. llvm-svn: 335768	2018-06-27 19:03:36 +00:00
Craig Topper	812fcb35e7	[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754	2018-06-27 16:47:39 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Craig Topper	33aba0eb4c	[X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group. Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group. The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed. llvm-svn: 335694	2018-06-27 00:42:24 +00:00

1 2 3 4 5 ...

17374 Commits