Commit Graph

13701 Commits

Author SHA1 Message Date
Craig Topper 92a4ff1294 [AVX-512] Add support for execution domain switching masked logical ops between floating point and integer domain.
This switches PS<->D and PD<->Q.

llvm-svn: 278097
2016-08-09 05:26:07 +00:00
Craig Topper 9bd6241106 [X86] Remove the Fv packed logical operation alias instructions. Replace them with patterns to the regular instructions.
This enables execution domain fixing which is why the tests changed.

llvm-svn: 278090
2016-08-09 03:06:33 +00:00
Craig Topper c09273b42b [X86] Cleanup patterns for AVX/SSE for PS operations. Always try to look for bitcasts from floating point types. If only AVX1 is supported we also need to handle integer types with floating point ops without looking for bitcasts.
Previously SSE1 had a pattern that looked for integer types without bitcasts, but the type wasn't legal with only SSE1 and SSE2 add an identical pattern for the integer instructions.

llvm-svn: 278089
2016-08-09 03:06:28 +00:00
Craig Topper de06b51d3d [X86] Remove unnecessary bitcast from the front of AVX1Only 256-bit logical operation patterns.
llvm-svn: 278088
2016-08-09 03:06:26 +00:00
Matthias Braun 7313ca6dbf X86InstrInfo: Update liveness in classifyLea()
We need to update liveness information when we create COPYs in
classifyLea().

This fixes http://llvm.org/28301

llvm-svn: 278086
2016-08-09 01:47:26 +00:00
Sanjay Patel 06ba09af67 [x86] split combineVSelectWithAllOnesOrZeros into a helper function; NFCI
llvm-svn: 278074
2016-08-09 00:01:11 +00:00
Charles Davis e9c32c7ed3 Revert "[X86] Support the "ms-hotpatch" attribute."
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.

llvm-svn: 278053
2016-08-08 21:20:15 +00:00
Charles Davis 0822aa118e [X86] Support the "ms-hotpatch" attribute.
Summary:
Based on two patches by Michael Mueller.

This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.

This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.

Reviewers: majnemer, sanjoy, rnk

Subscribers: dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D19908

llvm-svn: 278048
2016-08-08 21:01:39 +00:00
Nirav Dave f45fd2ba87 [X86] Improve code size on X86 segment moves
Moves of a value to a segment register from a 16-bit register is
equivalent to one from it's corresponding 32-bit register. Match gas's
behavior and rewrite instructions to the shorter of equivalent forms.

Reviewers: rnk, ab

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23166

llvm-svn: 278031
2016-08-08 18:01:04 +00:00
Simon Pilgrim 33fc788374 [X86][SSE] Assert if the shuffle mask indices are not -1 or within a valid input range
As discussed in post-review rL277959

llvm-svn: 277993
2016-08-08 11:07:34 +00:00
Craig Topper f44423120f [AVX-512] Improve lowering of inserting a single element into lowest element of a 512-bit vector of zeroes by using vmovq/vmovd/vmovss/vmovsd.
llvm-svn: 277965
2016-08-07 21:52:59 +00:00
Craig Topper 2c51c74d52 [AVX-512] Add 512-bit logical operations to load folding tables. Add avx512f stack folding test and move some tests from the avx512vl test.
llvm-svn: 277961
2016-08-07 17:14:09 +00:00
Craig Topper 938e7ab9e1 [AVX-512] Add EVEX encoded floating point MAX/MIN instructions to the load folding tables.
llvm-svn: 277960
2016-08-07 17:14:05 +00:00
Simon Pilgrim 21c61fba45 [X86] lowerVectorShuffle - ensure that undefined mask elements only use SM_SentinelUndef
Help lowering and combining (which can specify SM_SentinelZero mask elements) share more shuffle matching code.

llvm-svn: 277959
2016-08-07 15:29:12 +00:00
Elena Demikhovsky dca03bebd3 AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 integer values
Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV.
It allows to suppress two opposite BITCAST operations and avoid redundant "movs".

Differential Revision: https://reviews.llvm.org/D23247

llvm-svn: 277958
2016-08-07 13:05:58 +00:00
Craig Topper 49841c3812 [X86] Add commutable floating point max/min instructions to the load folding tables.
llvm-svn: 277949
2016-08-07 05:39:51 +00:00
Craig Topper c4d757093e [X86] Simplify a shuffle mask copy. NFC
llvm-svn: 277947
2016-08-07 05:39:46 +00:00
Simon Pilgrim bc573ca1b8 [X86][AVX2] Improve sign/zero extension on AVX2 targets
Split extensions to large vectors into 256-bit chunks - the equivalent of what we do with pre-AVX2 into 128-bit chunks

llvm-svn: 277939
2016-08-06 21:21:12 +00:00
Craig Topper 9d8676acc0 [AVX-512] Add SQRT/RCP14/RNDSCALE to hasUndefRegUpdate.
llvm-svn: 277934
2016-08-06 19:31:52 +00:00
Craig Topper 19505bc354 [AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate.
llvm-svn: 277933
2016-08-06 19:31:50 +00:00
Craig Topper f5d05fb0ce [X86] Add VRCPSSr_Int, VRSQRTSSr_Int, VSQRTSSr_Int, and VSQRTSDr_Int to hasUndefRegUpdate.
llvm-svn: 277931
2016-08-06 19:31:44 +00:00
Simon Pilgrim 7d168e19e8 [X86][SSE] Enable commutation between MOVHLPS and UNPCKHPD
Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities.

VEX encoded versions don't benefit so I haven't added support to them.

llvm-svn: 277930
2016-08-06 18:40:28 +00:00
Simon Pilgrim f56309f11a [X86][SSE] Add 2 input shuffle support to matchBinaryVectorShuffle
Not actually used yet...

llvm-svn: 277919
2016-08-06 11:22:39 +00:00
Benjamin Kramer b7d3311c77 Move helpers into anonymous namespaces. NFC.
llvm-svn: 277916
2016-08-06 11:13:10 +00:00
Simon Pilgrim 69b6a70834 [X86][SSE] Add initial support for 2 input target shuffle combining.
At the moment only the INSERTPS matching can actually use 2 inputs but the plumbing is now in place.

llvm-svn: 277839
2016-08-05 17:36:14 +00:00
Simon Pilgrim 24dc1e7a90 [X86][SSE] Update the the target shuffle matches to use the effective mask's value type directly instead of via the input value type.
Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type.

llvm-svn: 277817
2016-08-05 14:33:11 +00:00
Simon Pilgrim 7080005e67 [X86][SSE] Consistently use the target shuffle root value type for vector size calculations. NFCI.
Preparation for adding 2 input support so we want to avoid unnecessary references to the input value type.

llvm-svn: 277814
2016-08-05 13:02:53 +00:00
Simon Pilgrim 6f7b0cd530 [X86][SSE] Added target shuffle combine binary compute matching function. NFCI.
Added matchBinaryPermuteVectorShuffle and moved the blend+zero and insertps matching code into it.

llvm-svn: 277808
2016-08-05 11:16:53 +00:00
Michael Kuperstein 3ceac2bbd5 [LV, X86] Be more optimistic about vectorizing shifts.
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.

Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.

Differential Revision: https://reviews.llvm.org/D23049

llvm-svn: 277782
2016-08-04 22:48:03 +00:00
Simon Pilgrim 3dbce52c16 [X86][SSE] Rename target shuffle unary permute matching function. NFCI.
In preparation for adding a binary permute matching function.

llvm-svn: 277737
2016-08-04 17:16:50 +00:00
Simon Pilgrim c2370b810d [X86][SSE] Split off shuffle mask canonicalization from lowerVectorShuffle. NFCI.
The new function now returns true if the shuffle should be commuted.

This will allow target shuffle combines to share the code.

llvm-svn: 277728
2016-08-04 14:21:32 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Simon Pilgrim 5d5ca9c0cb [X86][SSE] Add initial costs for vector CTTZ/CTLZ
llvm-svn: 277716
2016-08-04 10:51:41 +00:00
Simon Pilgrim 8ae6dad49b [X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at lowering - this is what cost models are for
Improved CTTZ/CTLZ costings will be added shortly

llvm-svn: 277713
2016-08-04 10:14:39 +00:00
Dean Michael Berris 7e9abea2ae [XRay] Align entry and return sleds to 2 byte boundaries
This should ensure that we can atomically write two bytes (on top of the
retq and the one past it) and have those two bytes not straddle cache
lines.

We also move the label past the alignment instruction so that we can refer
to the actual first instruction, as opposed to potential padding before the
aligned instruction.

Update the tests to allow us to reflect the new order of assembly.

Reviewers: rSerge, echristo, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23101

llvm-svn: 277701
2016-08-04 07:37:28 +00:00
Simon Pilgrim 898f030f70 [X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs.
We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero).

This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask.

We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering.

This will allow us to combine to 2-input shuffles in a future patch.

Differential Revision: https://reviews.llvm.org/D22859

llvm-svn: 277631
2016-08-03 19:08:24 +00:00
Igor Breger c59b3a2236 [AVX512] Add aliases for vcvttss2si{l|q}, vcvttsd2si{l|q}, vcvttss2usi{l|q}, vcvttsd2usi{l|q} instructions.
Differential Revision: http://reviews.llvm.org/D23111

llvm-svn: 277586
2016-08-03 10:58:05 +00:00
Dean Michael Berris 0b8f6c8777 [XRay] Make the xray_instr_map section specification more correct
Summary:
We also add a test to show what currently happens when we create a
section per function and emit an xray_instr_map. This illustrates the
relationship (or lack thereof) between the per-function section and the
xray_instr_map section.

We also change the code generation slightly so that we don't always
create group sections, but rather only do so if a function where the
table is associated with is in a group.

Also in this change:

  - Remove the "merge" flag on the xray_instr_map section.
  - Test that we're generating the right table for comdat and non-comdat functions.

Reviewers: echristo, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23104

llvm-svn: 277580
2016-08-03 07:21:55 +00:00
Nirav Dave 8601ac11aa [MC] Fix Intel Operand assembly parsing for .set ids
Recommitting after fixing overaggressive fastpath return in parsing.

Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.

Associated commit to fix clang test commited shortly.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22585

llvm-svn: 277489
2016-08-02 17:56:03 +00:00
Igor Breger f44b79d08e [AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check.
Differential Revision: http://reviews.llvm.org/D23055

llvm-svn: 277435
2016-08-02 09:15:28 +00:00
Craig Topper 9433f975d0 [AVX-512] Mark VADDPS/PD and VMULPS/PD as commutable. This necessitated adding itineraries to all of the instructions that use the avx512_fp_binop_p class.
llvm-svn: 277422
2016-08-02 06:16:53 +00:00
Craig Topper 553535848f [AVX-512] Use SSE_MUL_ITINS_S/SSE_DIV_ITINS_S for the scalar FMUL/FDIV instructions to match SSE/AVX.
llvm-svn: 277421
2016-08-02 06:16:51 +00:00
Craig Topper 05948fb36c [AVX-512] Correct ExeDomain for many AVX-512 instructions.
llvm-svn: 277416
2016-08-02 05:11:15 +00:00
Hans Wennborg 7a3a49b18a Revert r276895 "[MC][X86] Fix Intel Operand assembly parsing for .set ids"
This caused PR28805. Adding a regression test.

llvm-svn: 277402
2016-08-01 23:00:01 +00:00
Simon Pilgrim 46f119a59f [X86] Use implicit masking of SHLD/SHRD shift double instructions
Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value

llvm-svn: 277341
2016-08-01 12:11:43 +00:00
Craig Topper c48c029610 [AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types.
llvm-svn: 277327
2016-08-01 07:55:33 +00:00
Craig Topper 749a111f1e [AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported.
Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling.

llvm-svn: 277321
2016-08-01 05:31:50 +00:00
Craig Topper 3946176314 [AVX-512] Use FR32X/FR64X/VR128X/VR256X register classes in addRegisterClass if AVX512(for FR32X/FR64) or VLX(for VR128X/VR256) is supported. This is a minimal requirement to be able to allocate all 32 registers.
llvm-svn: 277319
2016-08-01 04:29:13 +00:00
Craig Topper da50eec26d [X86] Move mask register handling into the main switch of getLoadStoreRegOpcode. No functional change intended.
llvm-svn: 277318
2016-08-01 04:29:11 +00:00
Craig Topper c0097dc7d0 [X86] Simplify code for determing GR or FR reg classes by querying for super classes instead of manually listing individual classes.
llvm-svn: 277306
2016-07-31 20:20:08 +00:00
Craig Topper 7afdc0fb25 [AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported.
llvm-svn: 277305
2016-07-31 20:20:05 +00:00
Craig Topper 4c53e60360 [AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests.
llvm-svn: 277304
2016-07-31 20:20:01 +00:00
Craig Topper eb1cc981a5 [AVX512] Move FR32X/FR64X handling in getLoadStoreRegOpcode into the main switch. No functional change intended.
llvm-svn: 277303
2016-07-31 20:19:55 +00:00
Craig Topper 338ec9a0cb [AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned.
llvm-svn: 277302
2016-07-31 20:19:53 +00:00
Craig Topper 2a6bbb8203 [AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass.
llvm-svn: 277301
2016-07-31 20:19:50 +00:00
Simon Pilgrim 6be48e4aa7 [X86] Improve 64-bit shifts on 32-bit targets (PR14593)
As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit.

Differential Revision: https://reviews.llvm.org/D23000

llvm-svn: 277299
2016-07-31 19:50:45 +00:00
Craig Topper 00d34ed64f [AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR.
Thanks to Igor Breger for pointing out my mistake.

llvm-svn: 277292
2016-07-31 17:15:07 +00:00
Elena Demikhovsky 6e9b16054f AVX-512: Removed AssertZext node before TRUNCATE
Removed AssertZext node, which was inserted between X86ISD::SETCC and "truncate to i1".

Differential Revision: https://reviews.llvm.org/D22850

llvm-svn: 277289
2016-07-31 06:48:01 +00:00
Simon Pilgrim 5e0d6b509a Strip trailing whitespace
llvm-svn: 277280
2016-07-30 20:53:21 +00:00
Simon Pilgrim 8bbd3650a6 [X86] Use peekThroughOneUseBitcasts helper function
llvm-svn: 277279
2016-07-30 20:51:26 +00:00
Simon Pilgrim cf49fa3251 [X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit
The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results

llvm-svn: 277271
2016-07-30 14:06:59 +00:00
Benjamin Kramer 205159c628 [X86] Fix lifetime of SMRange temporaries.
Found by asan -fsanitize-address-use-after-scope.

llvm-svn: 277266
2016-07-30 11:31:24 +00:00
Michael Kuperstein f396b4c40d [X86] Match PSADBW in straight-line code
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.

This adds support for straight-line patterns, like those generated by the
SLP vectorizer.

Differential Revision: https://reviews.llvm.org/D22889

llvm-svn: 277219
2016-07-29 21:45:51 +00:00
Simon Pilgrim f107ffa8f0 [X86][AVX] Fix VBROADCASTF128 selection bug (PR28770)
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.

llvm-svn: 277214
2016-07-29 21:05:10 +00:00
David L Kreitzer 8b959e5cfa Avoid unnecessary 32-bit to 64-bit zero extensions following
32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly
zero extends.

Differential Revision: https://reviews.llvm.org/D22941

llvm-svn: 277148
2016-07-29 15:09:54 +00:00
Simon Pilgrim cb780b32a3 [X86][SSE] Optimize the truncation of vector comparison results with PACKSS
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.

Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.

We avoid performing this on AVX512 as it should have better alternative truncation instructions.

Differential Revision: https://reviews.llvm.org/D22814

llvm-svn: 277132
2016-07-29 10:23:10 +00:00
Craig Topper e4f868ea16 [AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable.
llvm-svn: 277120
2016-07-29 06:06:04 +00:00
Craig Topper 5625d24977 [AVX512] Copy the patterns that recognize scalar arimetic operations inserting into the lower element of a packed vector from AVX/SSE so that we can use EVEX encoded instructions.
llvm-svn: 277119
2016-07-29 06:06:00 +00:00
Craig Topper c7de3a1018 [AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX.
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.

llvm-svn: 277098
2016-07-29 02:49:08 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Craig Topper 7e27885f69 [X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own.
Differential Revision: https://reviews.llvm.org/D22799

llvm-svn: 276987
2016-07-28 15:28:56 +00:00
Michael Kuperstein e7605e28f9 [X86] Factor out another piece of the SAD combine. NFCI.
llvm-svn: 276918
2016-07-27 20:59:51 +00:00
Nirav Dave 06a99a46e2 [MC][X86] Fix Intel Operand assembly parsing for .set ids
Fix intel syntax special case identifier operands that refer to a constant
(e.g. .set <ID> n) to be interpreted as immediate not memory in parsing.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22585

llvm-svn: 276895
2016-07-27 17:39:41 +00:00
Michael Kuperstein 2dc08f7df8 [X86] Split out absdiff detection from SAD combine. NFC.
Preparation for supporting PSADBW emission for straight-line code.

llvm-svn: 276798
2016-07-26 20:01:29 +00:00
Simon Pilgrim 019e102426 [X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics
Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding.

This was only unearthed when rL276102 started using the intrinsic again.....

llvm-svn: 276740
2016-07-26 10:41:28 +00:00
Simon Pilgrim 28c7d7093d Fixed spelling in comment
llvm-svn: 276738
2016-07-26 09:55:31 +00:00
Craig Topper 79011a660e [X86] Remove isCommutable=1 from instructions that also load. Commuting such instruction isn't useful as it would unfold the load. The exception being FMA3 instructions.
llvm-svn: 276733
2016-07-26 08:06:18 +00:00
Craig Topper 26000f8d90 [AVX512] Don't mark ADDSSZr_Int or MULSSZr_Int as commutable. The intrinsics have one of their arguments indicated as passing through the high bits and we can't commute that.
llvm-svn: 276732
2016-07-26 08:06:14 +00:00
Joel Jones 373d7d30dd MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.

The corresponding change to clang is in: http://reviews.llvm.org/D16538

Patch by: Joel Jones

Differential Revision: https://reviews.llvm.org/D16213

llvm-svn: 276654
2016-07-25 17:18:28 +00:00
Elena Demikhovsky 64e5f929d0 AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors.
It failed with assertion before this patch.

Differential Revision: https://reviews.llvm.org/D22735

llvm-svn: 276648
2016-07-25 16:51:00 +00:00
Craig Topper ce415ff9c5 [AVX512] Add load folding support for the unmasked forms of the FMA instructions.
llvm-svn: 276615
2016-07-25 07:20:35 +00:00
Craig Topper 318e40b6f7 [AVX512] Add some additional patterns so that we can fold broadcast loads in the first argument of an FMADD/FMSUB/FNMADD/FNMSUB/FMADDSUB/FMSUBADD node. Also add patterns to support all combinations of the broadcast input and the preserved input for masked versions.
llvm-svn: 276614
2016-07-25 07:20:31 +00:00
Craig Topper 6bcbf5338c [AVX512] Cleanup FMA operand order in patterns to match the VEX versions and to really be 213, 231, and 132.
llvm-svn: 276613
2016-07-25 07:20:28 +00:00
Simon Pilgrim 381a0ade5a [X86] Add 'FeatureSlowSHLD' to cpu 'bdver4'
As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing.

llvm-svn: 276569
2016-07-24 16:00:53 +00:00
Craig Topper 2dca3b287b [X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions.
This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions.

llvm-svn: 276559
2016-07-24 08:26:38 +00:00
Craig Topper 05629d05c7 [X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions.
llvm-svn: 276555
2016-07-24 07:32:45 +00:00
Craig Topper 8152b9cd96 [X86] Fix typo in comment.
llvm-svn: 276528
2016-07-23 16:44:08 +00:00
Craig Topper b6519db90d [AVX512] Implement commuting support for EVEX encoded FMA3 instructions.
llvm-svn: 276521
2016-07-23 07:16:56 +00:00
Craig Topper 6172b0b3e9 [X86] Make one of the FMA3 commuting methods static. Remove a call to isFMA3 just to get the IsIntrisic flag, instead get it during the first call and pass it along. NFC
llvm-svn: 276520
2016-07-23 07:16:53 +00:00
Craig Topper ca8f5f309c [X86] Fix switch statement indentation per coding standards.
llvm-svn: 276519
2016-07-23 07:16:50 +00:00
Simon Pilgrim ea0d4f9962 [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied)
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.

This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.

We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).

Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).

Differential Revision: https://reviews.llvm.org/D22460

llvm-svn: 276416
2016-07-22 13:58:44 +00:00
Benjamin Kramer 5ba0e20315 Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128"
It caused PR28657.

This reverts commit r276281.

llvm-svn: 276405
2016-07-22 11:03:10 +00:00
Craig Topper 52e2e8381b [AVX512] Add ExeDomain to vector extend and truncate instructions.
llvm-svn: 276394
2016-07-22 05:46:44 +00:00
Craig Topper f4151bea72 [AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions.
llvm-svn: 276393
2016-07-22 05:00:52 +00:00
Craig Topper 5ec33a9411 [AVX512] Fix the ExeDomain for some packed fp instructions.
llvm-svn: 276392
2016-07-22 05:00:42 +00:00
Craig Topper 0b90756b0a [AVX512] Add load folding for some AVX512VL logic and arithmetic instructions.
llvm-svn: 276391
2016-07-22 05:00:39 +00:00
Craig Topper ab13b33ded [AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too.
llvm-svn: 276390
2016-07-22 05:00:35 +00:00
Michael Kuperstein c523333bbf [X86] Do not use AND8ri8 in AVX512 pattern
This variant is (as documented in the TD) for disassembler use only, and should
not be used in patterns - it is longer, and is broken on 64-bit.

llvm-svn: 276347
2016-07-21 22:24:08 +00:00
Simon Pilgrim 88e0940d3b [X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element
Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW.

But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register.

This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41).

Fix for PR27265.

Differential Revision: https://reviews.llvm.org/D22509

llvm-svn: 276289
2016-07-21 14:54:17 +00:00
Simon Pilgrim b11bdd95f6 [X86][SSE] Pull out duplicate EXTRW lowering code. NFCI.
As requested on D22509, I've pulled out the v8i16 extraction lowering as the SSE41 and pre-SSE41 implementations are effectively the same.

llvm-svn: 276285
2016-07-21 14:30:17 +00:00
Simon Pilgrim c8e20b1150 [X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.

This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.

We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).

Differential Revision: https://reviews.llvm.org/D22460

llvm-svn: 276281
2016-07-21 14:10:54 +00:00
Matthias Braun ca8210a952 X86InstrInfo: No need for liveness analysis in classifyLEAReg()
classifyLEAReg() deals with switching operands from 32bit to 64bit in
order to use a LEA64_32 instruction (for three address code goodness).
It currently performs a liveness analysis to determine the kill/undef
flag for the newly added operand. This should not be necessary:

- If the previous operand had a kill flag, then the 32bit part of the
  register gets killed, this will kill the super register as well.
- If the previous operand had an undef flag then we didn't care what
  value we read, just use the same flag on the new operand.
  (No matter what an operand with an undef flag won't affect liveness)

This makes the code independent of the presence of kill flags because it
avoids a call to MachineBasicBlock::computeRegisterLiveness().

Differential Revision: http://reviews.llvm.org/D22283

llvm-svn: 276222
2016-07-21 00:33:38 +00:00
Simon Pilgrim 1b4f511aaa [X86][SSE] Add cost model values for CTPOP of vectors
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.

Differential Revision: https://reviews.llvm.org/D22456

llvm-svn: 276104
2016-07-20 10:41:28 +00:00
Craig Topper 09b99c3a75 [AVX512] Add a missing NoVLX to give priority to the AVX512 version of the pattern.
llvm-svn: 276088
2016-07-20 05:05:50 +00:00
Craig Topper 7a95428f7c [X86] Use 'HasAVX1Only' to properly give priority to the AVX2 version without relying on file ordering.
llvm-svn: 276087
2016-07-20 05:05:48 +00:00
Craig Topper cde436a663 [X86] Create some multiclasses to reduce the repeated patterns for VEXTRACT(F/I)128/VINSERT(I/F)128. NFC
llvm-svn: 276086
2016-07-20 05:05:46 +00:00
Craig Topper 55913ead3d [X86] Create some wrapper multiclasses to create AVX and SSE shift instructions with less repeated code. NFC
llvm-svn: 276085
2016-07-20 05:05:44 +00:00
Simon Pilgrim 0ea8d275cc [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106

llvm-svn: 275981
2016-07-19 15:07:43 +00:00
Elena Demikhovsky 2c0780b8e5 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354

llvm-svn: 275950
2016-07-19 07:14:21 +00:00
Craig Topper d6ca1dc45e [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.
llvm-svn: 275942
2016-07-19 02:00:38 +00:00
Craig Topper 592dc30708 [X86] Remove superfluous parameter from a multiclass. All instantiations passed the same value.
llvm-svn: 275941
2016-07-19 02:00:35 +00:00
Craig Topper 6189d3ecd4 [X86] Rename VINSERTzrr to use a capital Z to match other instructions. NFC
llvm-svn: 275939
2016-07-19 01:26:19 +00:00
Simon Pilgrim c941f6b329 [X86][AVX] Add target shuffle decode support for VBROADCAST
Currently we only decode broadcasts from a vector of the same size.

llvm-svn: 275823
2016-07-18 17:32:59 +00:00
Craig Topper a3c55f5915 [AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables.
llvm-svn: 275775
2016-07-18 06:49:32 +00:00
Craig Topper 16a0744955 [AVX512] Add KADD/KAND/KOR/KXOR to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275771
2016-07-18 06:14:59 +00:00
Craig Topper 463f949a3a [X86] Add VPMULLW/D/Q instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275770
2016-07-18 06:14:57 +00:00
Craig Topper 1af6cc00dc [X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275769
2016-07-18 06:14:54 +00:00
Craig Topper ba9b93d7f2 [X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275768
2016-07-18 06:14:50 +00:00
Craig Topper 3a99de4067 [X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative.
llvm-svn: 275767
2016-07-18 06:14:47 +00:00
Craig Topper fe5a6dc581 [X86] Add more AVX512 instructions to X86InstrInfo::isHighLatencyDef. Also add all packed fp division instructions.
llvm-svn: 275766
2016-07-18 06:14:45 +00:00
Craig Topper f7a06c29bc [X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr.
llvm-svn: 275765
2016-07-18 06:14:43 +00:00
Craig Topper 650a15e2b3 [X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related.
llvm-svn: 275764
2016-07-18 06:14:39 +00:00
Craig Topper 5c913e84df [AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported.
Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step.

llvm-svn: 275763
2016-07-18 06:14:34 +00:00
Craig Topper 53f3d1b4d0 [X86] Fix 80-column violations. NFC
llvm-svn: 275762
2016-07-18 06:14:26 +00:00
Simon Pilgrim 285d9e4d60 Strip trailing whitespace
llvm-svn: 275726
2016-07-17 19:02:27 +00:00
Simon Pilgrim 1be1222293 [X86][SSE] lowerVectorShuffleAsPermuteAndUnpack tidyup. NFCI.
Moved unpack type determination into TryUnpack lambda.

Added missing comment describing lowerVectorShuffleAsPermuteAndUnpack call.

llvm-svn: 275708
2016-07-17 15:48:25 +00:00
Guy Blank 3357ba36e2 test commit
llvm-svn: 275703
2016-07-17 12:10:35 +00:00
Craig Topper 8093437f2e [AVX512] Remove CodeGenOnly VBROADCAST m_Int instructions. They can be implemented with patterns selecting existing instructions. NFC
llvm-svn: 275671
2016-07-16 03:42:59 +00:00
Nico Weber 8d66df15f4 Teach fast isel about the win64 calling convention.
This mostly just works.

Vectorcall rets are still not supported.

The win64_eh test change is because fast isel doesn't use rsi for temporary
computations, so it doesn't need to be pushed. The test case I'm changing was
originally added to test pushes, but by now there are other test cases in that
file exercising that code path.

https://reviews.llvm.org/D22422

llvm-svn: 275607
2016-07-15 20:18:37 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Justin Lebar 0af80cd6f0 [CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand.
Summary:
Previously we took an unsigned.

Hooray for type-safety.

Reviewers: chandlerc

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D22282

llvm-svn: 275591
2016-07-15 18:26:59 +00:00
Jacques Pienaar 71c30a14b7 Rename AnalyzeBranch* to analyzeBranch*.
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.

Reviewers: tstellarAMD, mcrosier

Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai

Differential Revision: https://reviews.llvm.org/D22409

llvm-svn: 275564
2016-07-15 14:41:04 +00:00
Simon Pilgrim 2683ad54ad [X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes
As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level.

This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen.

Fixes PR28136.

llvm-svn: 275543
2016-07-15 09:49:12 +00:00
Simon Pilgrim 420b266d0a [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied)
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.

This was incorrectly reverted in rL275421 during triage of PR28552.

llvm-svn: 275497
2016-07-14 23:05:09 +00:00
Nirav Dave a6c7595d0f [X86][MC] Fix bracket expression parsing in intel-style assembly.
Only perform struct field check on Identifier tokens.

Fixes PR28547.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22361

llvm-svn: 275445
2016-07-14 17:37:05 +00:00
Nico Weber 5bb284226b Don't optimize movs to pushes in -O0 builds.
https://reviews.llvm.org/D22362

llvm-svn: 275431
2016-07-14 15:40:22 +00:00
Nico Weber ead8f8ffdd Delete some trailing whitespace.
llvm-svn: 275429
2016-07-14 15:07:44 +00:00
Ahmed Bougacha 85dc93c56b [X86] Decode MPX BND registers.
We were able to assemble, but not disassemble.

Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit
the uint8_t max.  The control registers were already squarely above
it, but I don't think they ever go in .r/m, only in .reg.

I also did notice an extra REX.W in our encoding, but I think that's
fine.

llvm-svn: 275427
2016-07-14 14:53:21 +00:00
Ahmed Bougacha 4f7a5e20ae [X86] Don't mark addressing mode operands as "outs". NFC-ish.
Nothing in-tree can tell the difference, but it's incorrect: the
addressing mode registers aren't what's defined.

llvm-svn: 275426
2016-07-14 14:53:17 +00:00
Nico Weber 3afaf16abc Revert r275411, it cause PR28552.
llvm-svn: 275421
2016-07-14 14:49:35 +00:00
Nico Weber ecdf45b1e6 Teach fast isel calls and rets about stdcall.
stdcall is callee-pop like thiscall, so the thiscall changes already did most
of the work for this.  This change only opts stdcall in and adds tests.

llvm-svn: 275414
2016-07-14 13:54:26 +00:00
Simon Pilgrim 534e3240e8 Remove trailing whitespace.
llvm-svn: 275412
2016-07-14 13:29:23 +00:00
Simon Pilgrim 3ecb6bdd5f [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.

llvm-svn: 275411
2016-07-14 13:28:43 +00:00
Simon Pilgrim 053d32906f [X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.

llvm-svn: 275406
2016-07-14 12:58:04 +00:00
Simon Pilgrim a76a8e50e5 [X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support
llvm-svn: 275400
2016-07-14 12:07:43 +00:00
Simon Pilgrim b8c261c931 [X86][AVX2] VBROADCASTSSrr/VBROADCASTSSYrr require AVX2 not AVX
llvm-svn: 275391
2016-07-14 10:37:14 +00:00
Craig Topper 6840f1150f [AVX512] Implement EXTLOAD lowering with patterns to select existing VPMOVZX instructions instead of creating CodeGenOnly instructions.
llvm-svn: 275378
2016-07-14 06:41:34 +00:00
Eli Friedman 17e8ea18e9 [X86] Fix stupid typo in isel lowering.
Apparently someone miscounted the number of zeros in the immediate.
Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 .

llvm-svn: 275376
2016-07-14 05:48:25 +00:00
Dean Michael Berris 52735fc435 XRay: Add entry and exit sleds
Summary:
In this patch we implement the following parts of XRay:

- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.

There are some caveats here:

1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.

2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.

Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk

Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D19904

llvm-svn: 275367
2016-07-14 04:06:33 +00:00
Nico Weber af7e8465e1 Teach fast isel about thiscall (and callee-pop) calls.
http://reviews.llvm.org/D22315

llvm-svn: 275360
2016-07-14 01:52:51 +00:00