Commit Graph

17374 Commits

Author SHA1 Message Date
Craig Topper 4fe321d1ce [X86] Add SHUF128 to target shuffle decoding.
Differential Revision: https://reviews.llvm.org/D48954

llvm-svn: 336376
2018-07-05 17:10:17 +00:00
Simon Pilgrim 8c3765dc6b [CostModel][X86] Add UDIV/UREM by pow2 costs
Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc.

llvm-svn: 336371
2018-07-05 16:56:28 +00:00
Craig Topper 350c5f1881 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
llvm-svn: 336315
2018-07-05 06:52:55 +00:00
Craig Topper 2db909cfae [X86] Remove some isel patterns for X86ISD::SELECTS that specifically looked for the v1i1 mask to have come from a scalar_to_vector from GR8.
We have patterns for SELECTS that top at v1i1 and we have a pattern for (v1i1 (scalar_to_vector GR8)). The patterns being removed here do the same thing as the two other patterns combined so there is no need for them.

llvm-svn: 336305
2018-07-05 03:01:29 +00:00
Craig Topper 95eb88abfe [X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg input.
Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)).

This patch fixes that so we can also combine things like (fmsub (fneg)).

llvm-svn: 336304
2018-07-05 02:52:56 +00:00
Craig Topper e4b9257b69 [X86] Remove some of the packed FMA3 intrinsics since we no longer use them in clang.
There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes.

More removals to come, but I wanted to stop and fix the regression that showed up in this first.

llvm-svn: 336303
2018-07-05 02:52:54 +00:00
Yvan Roux eaececf5e0 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Simon Pilgrim c3e1617bf9 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED)
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247.

llvm-svn: 336250
2018-07-04 09:12:48 +00:00
Fangrui Song 78ab286aa0 [X86][AsmParser] Fix inconsistent declaration parameter name in r336218
llvm-svn: 336232
2018-07-03 21:40:03 +00:00
Craig Topper e317533dcf [X86] Remove repeated 'the' from multiple comments that have been copy and pasted. NFC
llvm-svn: 336226
2018-07-03 20:39:55 +00:00
Craig Topper adc51ae425 [X86][AsmParser] Rework the in/out (%dx) hack one more time.
This patch adds a new token type specifically for (%dx). We will now always create this token when we parse (%dx). After all operands have been parsed, if the mnemonic is in/out we'll morph this token to a regular register token. Otherwise we keep it as the special DX token which won't match any instructions.

This removes the need for passing Mnemonic through the parsing functions. It also seems closer to gas where when its used on the wrong instruction it just gets diagnosed as an invalid operand rather than a bad memory address.

llvm-svn: 336218
2018-07-03 18:07:30 +00:00
Craig Topper bc598f0d61 [X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode.
This might make the error message added in r335668 unneeded, but I'm not sure yet.

The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit.

llvm-svn: 336217
2018-07-03 17:40:51 +00:00
Benjamin Kramer fd171f2f89 Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values"
This reverts commit r336113. It causes crashes.

llvm-svn: 336189
2018-07-03 11:15:17 +00:00
Krzysztof Parzyszek fd97494984 [X86] Add phony registers for high halves of regs with low halves
Add registers still missing after r328016 (D43353):
- for bits 15-8  of SI, DI, BP, SP (*H), and R8-R15 (*BH),
- for bits 31-16 of R8-R15 (*WH).

Thanks to Craig Topper for pointing it out.

llvm-svn: 336134
2018-07-02 19:05:09 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Simon Pilgrim 2bc8e079f2 [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values
We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case.

llvm-svn: 336113
2018-07-02 15:14:07 +00:00
Alex Bradbury c48908781d [X86] Use addAliasForDirective to support the .word directive (reland)
The X86 asm parser currently has custom parsing logic for .word. Rather than
use this custom logic, we can just use addAliasForDirective to enable the
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

This is a fixed reland of rL336100. This should have been caught in 
pre-commit testing so apologies for the noise.

llvm-svn: 336104
2018-07-02 13:49:52 +00:00
Alex Bradbury c000e4dcb5 Revert r336100
This was a bad change. .word == 2byte on x86.

llvm-svn: 336103
2018-07-02 13:43:45 +00:00
Alex Bradbury 42485ec9ca [X86] Use addAliasForDirective to support the .word directive
The X86 asm parser currently has custom parsing logic for .word. Rather than 
use this custom logic, we can just use addAliasForDirective to enable the 
reuse of AsmParser::parseDirectiveValue.

See also similar changes to Sparc (rL333078), AArch64 (rL333077), and Hexagon 
(rL332607) backends.

Differential Revision: https://reviews.llvm.org/D47004

llvm-svn: 336100
2018-07-02 13:37:15 +00:00
Simon Pilgrim e389434a8a [X86][BtVer2] Added Jaguar FPU Pipe0/1 uop counters to permit basic llvm-exegesis uop testing
We don't have PMCs to cover many of the Jaguar resources but we can at least monitor the FPU issue pipes which give an indication of the fpu uop count, just not the execution resources.

llvm-svn: 336089
2018-07-02 09:15:01 +00:00
Craig Topper e06dabd3ca [X86] Put some cases in switch statements back on one line to be more compact and make it easier to see the similarities. NFC
It looks like someone ran clang-format over this entire file which reformatted these switches into a multiline form. But I think the single line form is more useful here.

llvm-svn: 336077
2018-07-02 06:42:42 +00:00
Craig Topper 0661f67296 [X86] Remove FMA3Info DenseMap. Break into sorted tables that we can binary search.
I separated out the rounding and broadcast groups into their own tables because it made the ordering in the main table easier.

Further splitting of the tables might make it possible to directly index using bits from the TSFlags, but its probably not worth it right now.

llvm-svn: 336075
2018-07-02 06:23:39 +00:00
Craig Topper c004aa6c5f [X86] Remove the places that return nullptr from X86InstrInfo::commuteInstructionImpl.
findCommutedOpIndices does the pre-checking for whether commuting is possible. There should be no reason left to fail in commuteInstructionImpl. There was a missing pre-check that I've added there and changed the check to an assert in commuteInstructionImpl.

llvm-svn: 336070
2018-07-01 23:27:41 +00:00
Craig Topper 4d8ec92fb0 [X86][Disassembler] Remove TYPE_BNDR from translateImmediate.
I've check the disassembler tables and this shouldn't be reachable. Which is good since if it was reachable there should have been a 'return' after the addOperand line.

llvm-svn: 336066
2018-07-01 17:50:29 +00:00
Craig Topper a2d30b3134 [X86] Remove unnecessary include. NFC
Leftover from when the pass contained a DenseMap before it switched to binary search.

llvm-svn: 336057
2018-07-01 05:54:22 +00:00
Craig Topper 4e78213ae4 [X86] Move the memory unfolding table creation into its own class and make it a ManagedStatic.
Also move the static folding tables, their search functions and the new class into new cpp/h files.

The unfolding table is effectively static data. It's just a different ordering and a subset of the static folding tables.

By putting it in a separate ManagedStatic we ensure we only have one copy instead of one per X86InstrInfo object. This way also makes it only get initialized when really needed.

llvm-svn: 336056
2018-07-01 05:47:49 +00:00
Craig Topper 84199deb17 [X86] Move the X86InstrFMA3Info class into the cpp file. Expose only a getFMA3Group free function. NFCI
The class only exists to hold a DenseMap and is only created as a ManagedStatic. It used to expose a single static method that outside code was expected to use.

This patch moves that static function out of the class and moves it implementation into the cpp file. It can now access the ManagedStatic directly by name without the need for the other static method that accessed the ManagedStatic.

llvm-svn: 336055
2018-06-30 22:38:42 +00:00
Craig Topper 731740744f [X86] Remove the AsmName from the HAX,HDX,HCX,HBX,HSI,HDI,HBP,HSP,HIP artificial registers so they can't be parsed by the assembly parser.
There are no instructions that use them so they weren't causing any bad matches. But they weren't being diagnosed as "invalid register name" if they were used and would instead trigger some form of invalid operand.

llvm-svn: 336054
2018-06-30 22:38:41 +00:00
Craig Topper 1b7b9b8596 [X86] Use MVT::i8 for scalar shift amounts since that is what they ultimately need to legalize to.
I believe all of these are constants so legalizing them should be pretty trivial, but this saves a step.

In one case it looks like we may have been creating a shift amount larger than the shift input itself.

llvm-svn: 336052
2018-06-30 18:30:31 +00:00
Craig Topper 5f28d50d27 [X86] When combining load to BZHI, make sure we create the shift instruction with an i8 type.
This combine runs pretty late and causes us to introduce a shift after the op legalization phase has run. We need to be sure we create the shift with the proper type for the shift amount. If we don't do this, we will still re-legalize the operation properly, but we won't get a chance to fully optimize the truncate that gets inserted.

So this patch adds the necessary truncate when the shift is created. I've also narrowed the subtract that gets created to always be an i32 type. The truncate would have trigered SimplifyDemandedBits to optimize it anyway. But using a more appropriate VT here is free and saves an optimization step.

llvm-svn: 336051
2018-06-30 17:49:42 +00:00
Craig Topper 59f2f38fe0 [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.
llvm-svn: 336035
2018-06-30 01:32:04 +00:00
Craig Topper 87b107dd69 [X86] Limit the number of target specific nodes emitted in LowerShiftParts
The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes.

The changed test cases still aren't optimal.

Differential Revision: https://reviews.llvm.org/D48619

llvm-svn: 335998
2018-06-29 17:24:07 +00:00
Craig Topper 7c96f051d2 [X86] Use a std::vector for the memory unfolding table.
Previously we used a DenseMap which is costly to set up due to multiple full table rehashes as the size increases and causes the table to be reallocated.

This patch changes the table to a vector of structs. We now walk the reg->mem tables and push new entries in the mem->reg table for each row not marked TB_NO_REVERSE. Once all the table entries have been created, we sort the vector. Then we can use a binary search for lookups.

Differential Revision: https://reviews.llvm.org/D48585

llvm-svn: 335994
2018-06-29 17:11:26 +00:00
Simon Pilgrim aab8660e23 [X86][SSE] Support v16i8/v32i8 vector rotations
This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op.

Unfortunately I haven't found a decent way to share much of this code with the shift equivalent.

Differential Revision: https://reviews.llvm.org/D48655

llvm-svn: 335957
2018-06-29 09:36:39 +00:00
Craig Topper 875e9f8fa4 [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in IR instead.
While there improve the coverage of the intrinsic testing and add fast-isel tests.

llvm-svn: 335944
2018-06-29 05:43:26 +00:00
Craig Topper 90317d1d94 [X86] Suppress load folding into and/or/xor if it will prevent matching btr/bts/btc.
This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy.

Differential Revision: https://reviews.llvm.org/D48706

llvm-svn: 335895
2018-06-28 17:58:01 +00:00
Jonas Devlieghere b757fc3878 Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models""
Reverting because this is causing failures in the LLDB test suite on
GreenDragon.

  LLVM ERROR: unsupported relocation with subtraction expression, symbol
  '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction
  expression

llvm-svn: 335894
2018-06-28 17:56:43 +00:00
Jessica Paquette dafa198c96 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Hans Wennborg a257376003 s/TablesChecked/TableChecked/ after r335823
llvm-svn: 335831
2018-06-28 10:24:38 +00:00
Benjamin Kramer f9613b2995 Unify sorted asserts to use the existing atomic pattern
These are all benign races and only visible in !NDEBUG. tsan complains
about it, but a simple atomic bool is sufficient to make it happy.

llvm-svn: 335823
2018-06-28 10:03:45 +00:00
Craig Topper ec5d568ac1 [X86] Use PatFrag with hardcoded numbers for FROUND_NO_EXC/FROUND_CURRENT instead of ImmLeafs with predicates where one of the two numbers was hardcoded.
This more efficient for the isel table generator since we can use CheckChildInteger instead of MoveChild, CheckPredicate, MoveParent. This reduced the table size by 1-2K.

I wish there was a way to share the values with X86BaseInfo.h and still use a PatFrag like this. These numbers are fixed by the X86 intrinsic spec going back many years and we should never need to change them. So we shouldn't waste table bytes to support sharing.

llvm-svn: 335806
2018-06-28 01:45:44 +00:00
Craig Topper ab70f58891 [X86] Change how we prefer shift by immediate over folding a load into a shift.
BMI2 added new shift by register instructions that have the ability to fold a load.

Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load.

We used to enforce this priority by artificially lowering the complexity of the load pattern.

This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky.

llvm-svn: 335804
2018-06-28 00:47:41 +00:00
Benjamin Kramer e214f046af [X86] Make folding table checking threadsafe
This is a benign race, but tsan likes to complain about it. Just make it
happy.

llvm-svn: 335788
2018-06-27 21:01:53 +00:00
Craig Topper 880e34ed45 [X86] In X86DAGToDAGISel::PreprocessISelDAG, make sure we don't access N after we delete it.
If we turn X86ISD::AND into ISD::AND, we delete N. But we were continuing onto the next block of code even though N no longer existed.

Just happened to notice it. I assume asan didn't notice it because we explicitly unpoison deleted nodes and give them a DELETE_NODE opcode.

llvm-svn: 335787
2018-06-27 20:58:46 +00:00
Fangrui Song b0d57a535b [X86] Fix unmatched parenthesis in r335768
llvm-svn: 335769
2018-06-27 19:12:07 +00:00
Craig Topper 6bea2c7f9b [X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result.
The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte.

This should match the behavior of objdump from binutils.

llvm-svn: 335768
2018-06-27 19:03:36 +00:00
Craig Topper 812fcb35e7 [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position
If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index.

Fixes PR37938.

llvm-svn: 335754
2018-06-27 16:47:39 +00:00
Craig Topper 31cbe75b3b [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Craig Topper 33aba0eb4c [X86] Don't store register and memory FMA3 opcodes in the same X86InstrFMA3Group.
Nothing was using this relationship. By splitting them we no longer need to worry about register or memory entries being empty in a group.

The memory folding tables in X86InstrInfo.cpp can be used to access this relationship if needed.

llvm-svn: 335694
2018-06-27 00:42:24 +00:00