Commit Graph

17279 Commits

Author SHA1 Message Date
Craig Topper 0b7936737b [X86] Initialize FMA3Info directly in its constructor instead of relying on std::call_once
FMA3Info only exists as a managed static. As far as I know the ManagedStatic construction proccess is thread safe. It doesn't look like we ever access the ManagedStatic object without immediately doing a query on it that would require the map to be populated. So I don't think we're ever deferring the calculation of the tables from the construction of the object.

So I think we should be able to just populate the FMA3Info map directly in the constructor and get rid of all of the initGroupsOnce stuff.

Differential Revision: https://reviews.llvm.org/D48194

llvm-svn: 335064
2018-06-19 18:06:52 +00:00
Craig Topper 7ffa976993 [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.

llvm-svn: 335062
2018-06-19 17:51:42 +00:00
Mikhail Dvoretckii b1ce7765be [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.

Differential Revision: https://reviews.llvm.org/D45203

llvm-svn: 335037
2018-06-19 10:37:52 +00:00
Mikhail Dvoretckii bd8ed2dbaa Test commit.
llvm-svn: 335026
2018-06-19 07:55:10 +00:00
Craig Topper c2965214ef [X86] Add the ability to force an EVEX2VEX mapping table entry from the .td files. Remove remaining manual table entries from the tablegen emitter.
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.

Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.

Finally, remove the manual table from the emitter.

This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.

llvm-svn: 335018
2018-06-19 04:24:44 +00:00
Craig Topper 0a5e90cc2a [X86] Add a new VEX_WPrefix encoding to tag EVEX instruction that have VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0.
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.

The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.

This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.

This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.

llvm-svn: 335017
2018-06-19 04:24:42 +00:00
Craig Topper 46c0b368d6 [X86] Simplify the TSFlags checking code in EvexToVexInstPass. NFCI
The code was previously checking the L2 and L flag on 3 separate lines, treating the combination as an encoding. Instead its better to think of the L2 bit as being something that can't be done with VEX and early returning. Then we just need to check the L bit.

llvm-svn: 335015
2018-06-19 03:17:46 +00:00
Craig Topper a7b7f2f4d8 [X86] Remove ReadAfterLd from avx512_shift_rmbi multiclass.
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?

llvm-svn: 334994
2018-06-18 23:20:57 +00:00
Eric Christopher 88bbad24c0 Tidy comment language and explanation.
llvm-svn: 334990
2018-06-18 22:21:19 +00:00
Eric Christopher b1faaf3069 Pull non-lazy stub table emission into a separate function alongside
the individual stub creation to increase readability a bit in the
non-object file format specific function.

llvm-svn: 334989
2018-06-18 22:21:18 +00:00
Eric Christopher b6d0b99f3b Add return statements to make it clear that all of these are mutually exclusive conditions.
else if would have worked just as well, but this keeps the original readability a bit more clear.

llvm-svn: 334988
2018-06-18 22:21:13 +00:00
Craig Topper 17bd84c12c [X86] Encode the EVEX2VEX exception list information in .td files instead of the emitter source.
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.

llvm-svn: 334971
2018-06-18 18:47:07 +00:00
Simon Pilgrim 9173c97ce4 [X86][BtVer2] Flag AVX2+ scheduler classes as unsupported
Jaguar only supports up to AVX1

Differential Revision: https://reviews.llvm.org/D48274

llvm-svn: 334947
2018-06-18 14:31:14 +00:00
Clement Courbet 0d9da88d18 [X86] Fix NOOP sched overrides on BDW/HSW/SKL.
Summary: Noop certainly does not use resources.

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits, gchatelet

Differential Revision: https://reviews.llvm.org/D48028

llvm-svn: 334927
2018-06-18 06:48:22 +00:00
Craig Topper f0ab7bd196 [X86] Create X86InstrFMA3Group objects fully in a static table instead of on the heap. NFCI
Previously we heap allocated the X86InstrFMA3Group objects which were created by passing them small register/memory opcode arrays that existed as individual static tables.

Rather than a bunch of small static arrays we now have one large static table of X86InstrFMA3Group objects. Rather than storing a pointer to the opcode arrays in the X86InstrFMA3Group object, we now store have a register and memory array as part of the object. If a group doesn't have memory or register opcodes, the array entries will be 0.

This greatly simplifies the destruction of the X86InstrFMA3Info object. We no longer need to delete the X86InstrFMA3Group objects as we destruct the DenseMap. And we don't need to keep track of which ones we already deleted.

This reduces the llc binary size on my local machine by ~50k. I can only assume that's really due to the fact that we had something like 512 small static arrays that we passed to the init functions either one at a time or in pairs. So there were between 256 and 512 distinct calls to the init functions in the initOnceImpl method.

llvm-svn: 334925
2018-06-18 06:32:22 +00:00
Craig Topper 16fdde5e63 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

llvm-svn: 334922
2018-06-18 05:00:50 +00:00
Craig Topper 916d0cf649 [X86] Move the 'vmovq.s' and similar assembly strings for EVEX vector moves with reversed operands to InstAliases.
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.

Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.

llvm-svn: 334920
2018-06-18 01:28:05 +00:00
Craig Topper 9fe45d846e [X86] Add all the FMA instructions direclty to the load folding table instead of proxying through X86InstrFMA3Info.
These increases the size of the static tables, but is closer to what we would get if used the autogenerated table directly. This reduces the remaining large deltas between what's in the manual table and what's in the autogenerated table.

llvm-svn: 334915
2018-06-17 18:00:16 +00:00
Craig Topper b0e986f88e [X86] Pass the parent SDNode to X86DAGToDAGISel::selectScalarSSELoad to simplify the hasSingleUseFromRoot handling.
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.

This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.

llvm-svn: 334908
2018-06-17 16:29:46 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Craig Topper c435632862 [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser.
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.

llvm-svn: 334897
2018-06-16 23:25:48 +00:00
Craig Topper 74412c7d59 [X86] Fix an inconsistency between AVX512 and AVX/SSE version on a couple instructions.
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.

VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.

llvm-svn: 334896
2018-06-16 23:25:47 +00:00
Craig Topper d00e375310 [X86] Add more instructions to the hasUndefRegUpdate list.
Not sure any of these matter today because I don't think we ever produce them with IMPLICIT_DEF as an input. But by listing them we don't be suprised in the future.

llvm-svn: 334867
2018-06-15 22:25:04 +00:00
Tomasz Krupa bcaab53d47 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599

llvm-svn: 334849
2018-06-15 18:05:24 +00:00
Craig Topper 1657b7b8d2 [X86] Prevent folding stack reloads into instructions in hasUndefRegUpdate.
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.

llvm-svn: 334848
2018-06-15 17:56:17 +00:00
Craig Topper c8a763ed84 Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update."
There's a typo causing the build to fail.

llvm-svn: 334803
2018-06-15 06:15:26 +00:00
Craig Topper 5ec210cc27 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

llvm-svn: 334802
2018-06-15 06:11:36 +00:00
Craig Topper 3c4cc01226 [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

llvm-svn: 334800
2018-06-15 05:49:19 +00:00
Craig Topper f43807dd89 [X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency.
llvm-svn: 334785
2018-06-15 04:42:54 +00:00
Sanjay Patel f85ca6abee [x86] be more selective about converting 'and' to shuffle (PR37749)
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().

We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly 
are not worthwhile for AVX1. 

I think in most cases we are able to recover by converting 
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.

llvm-svn: 334759
2018-06-14 19:55:02 +00:00
Craig Topper bfa94d5086 [X86] Fix stale comment in folding tables.
llvm-svn: 334758
2018-06-14 19:28:31 +00:00
Craig Topper 3ffeb41f6b [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

llvm-svn: 334728
2018-06-14 15:40:31 +00:00
Craig Topper 82fa048371 [X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions.
llvm-svn: 334727
2018-06-14 15:40:30 +00:00
Craig Topper b0742bf30d [X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.

llvm-svn: 334726
2018-06-14 15:40:29 +00:00
Craig Topper 9f829f76e8 [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.
Some of these instructions are already in the manual folding table so we should have them in the auto table too.

llvm-svn: 334725
2018-06-14 15:40:27 +00:00
Craig Topper b2552e1e08 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

llvm-svn: 334685
2018-06-14 03:16:58 +00:00
Craig Topper f7f663e0a9 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

llvm-svn: 334648
2018-06-13 20:03:42 +00:00
Sanjay Patel b983ac6fe1 [x86] eliminate even more sign-bit tests with vector select
This shortcoming was noted in D47330, and the test diffs show we already 
had other examples where we failed to fold to a SHRUNKBLEND:

/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.

This patch implements an idea from D48043 and would obsolete that patch 
because it catches more cases (notable the AVX1 case that was missed there). 
All we're doing is allowing the existing transform to fire more often by 
removing the post-legalize constraint. All of the relevant feature checks 
and other predicates are left as-is.

Differential Revision: https://reviews.llvm.org/D48078

llvm-svn: 334592
2018-06-13 12:28:32 +00:00
Craig Topper 3829d258ee [X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead.
llvm-svn: 334576
2018-06-13 07:19:21 +00:00
Craig Topper 55488731be [X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator.
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.

llvm-svn: 334563
2018-06-13 00:04:08 +00:00
Craig Topper 4f9cac667b [X86] Remove VPCOMPRESSB/W from the autogenerated load folding table.
llvm-svn: 334562
2018-06-13 00:04:04 +00:00
Craig Topper 3a34c3596d [X86] Remove mayLoad flag from AVX512 truncating store instructions.
llvm-svn: 334529
2018-06-12 19:59:08 +00:00
Reid Kleckner 98117a47e6 [MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFF
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.

Patch by Chris January

Differential Revision: https://reviews.llvm.org/D47783

llvm-svn: 334523
2018-06-12 18:56:05 +00:00
Fangrui Song f72cdb50be [MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use R_X86_64_GOTPC32 instead of R_X86_64_PC32
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:

  leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc

GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.

Reviewers: javed.absar, echristo

Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar

Differential Revision: https://reviews.llvm.org/D47507

llvm-svn: 334515
2018-06-12 16:20:44 +00:00
Simon Pilgrim e39fa6cbbb [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Craig Topper ede97c9548 [X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table.
llvm-svn: 334511
2018-06-12 15:48:03 +00:00
Sanjay Patel c3466d2568 [x86] move shrunkblend transform to helper function; NFCI
We should be able to obsolete D48043 by easing the constraints
on this existing code. 

llvm-svn: 334504
2018-06-12 14:21:51 +00:00
Craig Topper 88c230265b [X86] Add NotMemoryFoldable to the VPCOMPRESS instructions.
llvm-svn: 334481
2018-06-12 07:32:19 +00:00
Craig Topper 5799e4df75 [X86] Add NotMemoryFoldable to more instructions.
These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form.

llvm-svn: 334479
2018-06-12 07:32:17 +00:00
Craig Topper 66572df76e [X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from the autogenerated load folding table.
Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table.

A few are real opcode collisions where the memory and register forms are completely different instructions.

llvm-svn: 334474
2018-06-12 04:34:59 +00:00