Commit Graph

10741 Commits

Author SHA1 Message Date
Sean Eveson f77b4d2f38 [MC] Function stack size section.
Summary:
Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html

I wasn't sure who to put as reviewers, so please add/remove people as appropriate.

This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).

The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.

There is a follow up change to add an option to clang.

Thanks.

Reviewers: hfinkel, MatzeB

Reviewed By: MatzeB

Subscribers: thegameg, asb, llvm-commits

Differential Revision: https://reviews.llvm.org/D39788

llvm-svn: 319423
2017-11-30 12:01:16 +00:00
Simon Pilgrim 3e5987cf8d [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes
llvm-svn: 319418
2017-11-30 10:48:47 +00:00
Craig Topper a495744d2c [X86] Optimize avx2 vgatherqps for v2f32 with v2i64 index type.
Normal type legalization will widen everything. This requires forcing 0s into the mask register. We can instead choose the form that only reads 2 elements without zeroing the mask.

llvm-svn: 319406
2017-11-30 07:01:40 +00:00
Craig Topper 321a8b9b63 [X86] Make sure we don't remove sign extends of masks with AVX2 masked gathers.
We don't use k-registers and instead use the MSB so we need to make sure we sign extend the mask to the msb.

llvm-svn: 319405
2017-11-30 06:31:31 +00:00
Craig Topper cf461a0a32 [SelectionDAG][X86] Teach promotion legalization for fp_to_sint/fp_to_uint to insert an assertsext/assertzext based on the original type
If we put in an assertsext/zext here, we're able to generate better truncate code using pack on pre-avx512 targets.

Similar is already done during type legalization. This is the equivalent for op legalization

Differential Revision: https://reviews.llvm.org/D40591

llvm-svn: 319368
2017-11-29 22:15:43 +00:00
Simon Pilgrim 4d2c703492 [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes (REVERSION)
Accidental commit of incomplete patch

llvm-svn: 319346
2017-11-29 19:37:38 +00:00
Simon Pilgrim 87034cb498 [X86][AVX512] Tag RCP/RSQRT/GETEXP instructions scheduler classes
llvm-svn: 319338
2017-11-29 19:19:59 +00:00
Simon Pilgrim 36be852cee [X86][AVX512] Tag 3OP (shuffles, double-shifts and GFNI) instructions scheduler classes
llvm-svn: 319337
2017-11-29 18:52:20 +00:00
Craig Topper fbf7b3bf3e [X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization.
llvm-svn: 319266
2017-11-29 00:32:09 +00:00
Craig Topper 8261c8c066 [X86] Add test cases for fptosi v16f32->v16i8/v16i16 to show scalarization.
llvm-svn: 319261
2017-11-29 00:02:22 +00:00
Craig Topper 88ffb5d4d5 [X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of legal. Fix infinite loop in op legalization when promotion requires 2 steps.
Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel.

The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value.

llvm-svn: 319259
2017-11-28 23:56:02 +00:00
Craig Topper 3f749c2d4b [X86] Regenerate avx512-schedule test.
For some reason some sqrt instructions were missing the scheduling comments.

llvm-svn: 319258
2017-11-28 23:55:59 +00:00
Simon Pilgrim b9aa93cb93 [X86] Tag CLFLUSHOPT with same scheduling behaviour as CLFLUSH
llvm-svn: 319253
2017-11-28 23:25:42 +00:00
Simon Pilgrim a675071b7a [X86] Add CLFLUSHOPT schedule tests
llvm-svn: 319250
2017-11-28 23:12:12 +00:00
Simon Pilgrim c6c2103e1b [X86] Test clflushopt intrinsic on 32 and 64-bit targets
llvm-svn: 319247
2017-11-28 23:04:42 +00:00
Craig Topper dd4295626b [X86] In lowerVectorShuffleAsElementInsertion, if were able to find a scalar i8 or i16 and need to zero extend it, make sure we use a vXi32 type of the full vector width.
Previously, this was hardcoded to v4i32, but if the input type is 256 bits we need to use v8i32.

Fixes PR35443

llvm-svn: 319208
2017-11-28 19:25:45 +00:00
Francis Visoiu Mistrih 9d7bb0cb40 [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Simon Pilgrim ece5bc358a [X86][X87] Tag FTST x87 instruction scheduler class
Looking through Agner, FTST is very similar to generic float compare behaviour, so I've added them to the existing IIC_FCOMI (WriteFAdd) tags.

llvm-svn: 319184
2017-11-28 16:57:20 +00:00
Simon Pilgrim 0747a7e8c3 [X86][X87] Tag FABS/FCHS/FSQRT/FSIN/FCOS x87 instruction scheduler classes
Atom's FABS/FCHS/FSQRT latencies taken from Agner.

Note: I just added FSIN and FCOS to the existing IIC_FSINCOS itinerary, which is actually a more costly instruction.
llvm-svn: 319175
2017-11-28 15:03:42 +00:00
Simon Pilgrim b843dc26e4 [X86][X86] Add some x87 schedule tests
Still missing some instructions: mainly loads/stores/system ops, all flagged as TODO.

llvm-svn: 319172
2017-11-28 14:35:52 +00:00
Simon Pilgrim 8dc603b031 [X86][3DNow] Add instruction itinerary and scheduling classes for femms/prefetch/prefetchw
llvm-svn: 319167
2017-11-28 12:37:35 +00:00
Martin Storsjo 04b68446eb [COFF] Implement constructor priorities
The priorities in the section name suffixes are zero padded,
allowing the linker to just do a lexical sort.

Add zero padding for .ctors sections in ELF as well.

Differential Revision: https://reviews.llvm.org/D40407

llvm-svn: 319150
2017-11-28 08:07:18 +00:00
Simon Dardis 3aeb1a5404 [DAGCombine] Disable finding better chains for stores at O0
Unoptimized IR can have linear sequences of stores to an array, where the
initial GEP for the first store is formed from the pointer to the array, and the
GEP for each store after the first is formed from the previous GEP with some
offset in an inductive fashion.

The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive
number of combines as each store node is examined every time its' offset node
is combined with any child of the offset. One of the transformations is
findBetterNeighborChains which assists MergeConsecutiveStores. The former
relies on repeated chain walking to do its' work, however MergeConsecutiveStores
is disabled at O0 which makes the transformation redundant.

Any optimization level other than O0 would invoke InstCombine which would
resolve the chain of GEPs into flat base + offset GEP for each store which
does not exhibit the repeated examination of each store to the array.

Disabling this optimization fixes an excessive compile time issue (30~ minutes
for the test case provided) at O0.

Reviewers: niravd, craig.topper, t.p.northover

Differential Revision: https://reviews.llvm.org/D40193

llvm-svn: 319142
2017-11-28 04:07:59 +00:00
Craig Topper ddbc340c20 [X86] Make zero extend from v16i1/v8i1 to v16i8/v8i16/v16i16 not scalarize under AVX512.
llvm-svn: 319136
2017-11-28 01:36:33 +00:00
Craig Topper 5befc5bfce [X86] Add command line without AVX512BW/AVX512VL to bitcast-int-to-vector-bool-zext.ll.
llvm-svn: 319135
2017-11-28 01:36:31 +00:00
Craig Topper dbd4a7fecc [DAGCombiner] Don't combine aext(setcc) if the setcc is already using the target's preferred result type.
With AVX512 vXi1 types are legal so we shouldn't be extending them.

This change is similar to existing code in the zext(setcc) combine.

llvm-svn: 319120
2017-11-27 23:51:40 +00:00
Craig Topper 256cc48df6 [X86] Teach getSetCCResultType to handle more than just SimpleVTs when looking at larger than 512-bit vectors.
Which VTs are considered simple is determined by the superset of the legal types of all targets in LLVM. If we're looking at VTs that are going to be split down to 512-bits we should allow any VT not just simple ones since the simple list changes over time as new targets are added.

llvm-svn: 319110
2017-11-27 22:56:10 +00:00
Sanjay Patel 0de1a4bc2d [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result
This should fix PR31455:
https://bugs.llvm.org/show_bug.cgi?id=31455

Differential Revision: https://reviews.llvm.org/D28314

llvm-svn: 319094
2017-11-27 21:15:43 +00:00
Craig Topper 51f2886f93 [X86] Add avx512bw command lines to vselect-packss.ll
This shows several places where we fail to use masked move or blendm.

llvm-svn: 319063
2017-11-27 18:00:49 +00:00
Craig Topper 62189f7ab3 [X86] Make getSetCCResultType return vXi1 for any vXi32/vXi64 vector over 512 bits long when AVX512 is enabled.
Similar for vXi16/vXi8 with BWI.

Any vector larger than 512 bits will be split to 512 bits during legalization. But without this we will fold sexts with them before that making it difficult to recover leading to scalarization.

llvm-svn: 319059
2017-11-27 17:51:55 +00:00
Simon Pilgrim 4164009b48 [X86] Add INVLPGA to the existing INVLPG scheduling
llvm-svn: 319031
2017-11-27 14:39:50 +00:00
Simon Pilgrim be0369ca0d [X86] Add scheduling tests for invlpg/invlpga
llvm-svn: 319029
2017-11-27 14:23:55 +00:00
Andrew V. Tischenko 26dde7719b Update BTVER2 sched numbers for SSE42 string instructions.
Differential Revision: https://reviews.llvm.org/D39846

llvm-svn: 319013
2017-11-27 09:58:00 +00:00
Craig Topper 400c32ecb9 [SelectionDAG] Teach SplitVecRes_SETCC to call GetSplitVector if the operands have already been split.
llvm-svn: 319010
2017-11-27 05:52:54 +00:00
Simon Pilgrim fe6e92d517 [X86][3DNow] Add 3DNow! instruction itinerary and scheduling classes
llvm-svn: 319005
2017-11-26 20:50:29 +00:00
Simon Pilgrim 61145d5c25 [X86][SSE] Add SSE42 tests to the clear upper tests
llvm-svn: 319003
2017-11-26 20:03:53 +00:00
Simon Pilgrim f545bb6cae [X86][MMX] Add IIC_MMX_MOVMSK instruction itinerary class
llvm-svn: 318999
2017-11-26 17:56:07 +00:00
Oren Ben Simhon fa582b075c Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)
Shadow stack solution introduces a new stack for return addresses only.
The HW has a Shadow Stack Pointer (SSP) that points to the next return address.
If we return to a different address, an exception is triggered.
The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP).
The intrinsics are mapped to new instruction set that implements CET mechanism.

The patch also includes initial infrastructure support for IBT.

For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40223

Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4
llvm-svn: 318996
2017-11-26 13:02:45 +00:00
Coby Tayree d8b17bedfa [x86][icelake]GFNI
galois field arithmetic (GF(2^8)) insns:
gf2p8affineinvqb
gf2p8affineqb
gf2p8mulb
Differential Revision: https://reviews.llvm.org/D40373

llvm-svn: 318993
2017-11-26 09:36:41 +00:00
Craig Topper e485631cd1 [X86] Add separate intrinsics for scalar FMA4 instructions.
Summary:
These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.

I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.

I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.

I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.

fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39851

llvm-svn: 318984
2017-11-25 18:32:43 +00:00
Craig Topper ea37e201ec [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions.
Summary:
This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior.

Test command lines have been added for these two cases.

Reviewers: magabari, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40282

llvm-svn: 318983
2017-11-25 18:09:37 +00:00
Andrew V. Tischenko 198720d38e Add BTVER2 sched support for SHLD/SHRD.
Differential Revision: https://reviews.llvm.org/D40124

llvm-svn: 318977
2017-11-25 10:46:53 +00:00
Craig Topper c1b3269171 [X86] Support folding to andnps with SSE1 only.
With SSE1 only, we emit FAND and FXOR nodes for v4f32.

llvm-svn: 318968
2017-11-25 07:20:22 +00:00
Craig Topper 5b85df8605 [X86] Add some early DAG combines to turn v4i32 AND/OR/XOR into FAND/FOR/FXOR whe only SSE1 is available.
v4i32 isn't a legal type with sse1 only and would end up getting scalarized otherwise.

This isn't completely ideal as it doesn't handle cases like v8i32 that would get split to v4i32. But it at least helps with code written using the clang intrinsic header.

llvm-svn: 318967
2017-11-25 07:20:21 +00:00
Craig Topper 13ed01e635 [X86] Prevent using X * rsqrt(X) to approximate sqrt when only sse1 is enabled.
This optimization can occur after type legalization and emit a vselect with v4i32 type. But that type is not legal with sse1. This ultimately gets scalarized by the second type legalization that runs after vector op legalization, but that's really intended to handle the scalar types that might be introduced by legalizing vector ops.

For now just stop this from happening by disabling the optimization with sse1.

llvm-svn: 318965
2017-11-24 19:57:48 +00:00
Craig Topper 40a1edc307 [X86] Don't invert NewCC variable while processing the jcc/setcc/cmovcc instructions in optimizeCompareInstr.
The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value.

Fixes PR35399.

llvm-svn: 318934
2017-11-23 19:25:45 +00:00
Craig Topper f31b0b850b [X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register.
llvm-svn: 318933
2017-11-23 18:41:21 +00:00
Simon Pilgrim 90accbc5d9 [X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841)
(V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well.

This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW.

A later patch will use this for v16i8 reductions as well (PR32841).

Differential Revision: https://reviews.llvm.org/D39729

llvm-svn: 318917
2017-11-23 13:50:27 +00:00
Coby Tayree e8bdd383e9 [x86][icelake]BITALG
2/3
vpshufbitqmb encoding
3/3
vpshufbitqmb intrinsics
Differential Revision: https://reviews.llvm.org/D40222

llvm-svn: 318904
2017-11-23 11:15:50 +00:00
Craig Topper 3fba1bfb77 [X86] Regenerate the vector-popcnt and vector-tzcnt tests to get BITALG CHECK linse on all functions not just the vXi16/vXi8.
llvm-svn: 318885
2017-11-22 23:35:12 +00:00
Craig Topper 726968d6a2 [X86] Support v32i16/v64i8 CTLZ using lookup table.
Had to tweak the setcc's used by the code to use a vXi1 result type with a sign extend back to vector size.

llvm-svn: 318871
2017-11-22 20:05:57 +00:00
Craig Topper ba150ef60a [X86] Allow vpclmulqdq instructions to be commuted during isel to allow load folding.
The commuting patterns for the AVX version actually still had priority over the new patterns.

llvm-svn: 318800
2017-11-21 21:05:21 +00:00
Coby Tayree 5c7fe5df53 [x86][icelake]BITALG
vpopcnt{b,w}
Differential Revision: https://reviews.llvm.org/D40213

llvm-svn: 318748
2017-11-21 10:32:42 +00:00
Coby Tayree 3880f2a363 [x86][icelake]VNNI
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208

llvm-svn: 318746
2017-11-21 10:04:28 +00:00
Coby Tayree 71e37cc9ff [x86][icelake]vbmi2
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206

llvm-svn: 318745
2017-11-21 09:48:44 +00:00
Coby Tayree 7ca5e58736 [x86][icelake]vpclmulqdq introduction
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101

llvm-svn: 318741
2017-11-21 09:30:33 +00:00
Coby Tayree 2a1c02fcbc [x86][icelake]VAES introduction
an icelake promotion of AES
Differential Revision: https://reviews.llvm.org/D40078

llvm-svn: 318740
2017-11-21 09:11:41 +00:00
Craig Topper 67eb30ab60 [SelectionDAG] When promoting the result of a VSELECT, make sure we promote the condition to the SetCC type for the final result type not the original type.
Normally this would be cleaned up by promoting the condition operand next. But in the attached case we promoted the result from v2i48 to v2i64 and the condition from v2i1 to v2i48. Then we tried to "promote" the v2i48 condition back to v2i1 because that's what the SetCC result type for v2i64 is on X86 with VLX. But promote is either a NOP or SIGN_EXTEND and this would need a truncation.

With the change here we now get the SetCC type of v2i1 when we're handling the result promotion and the operand no longer needs to be promoted itself.

Fixes PR35272.

llvm-svn: 318706
2017-11-20 23:08:50 +00:00
Paul Robinson 746edea0ae Revert "Fix out-of-order stepping behavior in programs with sunk instructions."
This reverts commit 30419e150cd940893a13b345e85f96053850208f.
aka r318679.  It caused "sanitizer-windows" bot to fail.

llvm-svn: 318684
2017-11-20 19:07:52 +00:00
Paul Robinson f0b02965e4 Fix out-of-order stepping behavior in programs with sunk instructions.
MachineSink attempts to place instructions near the basic blocks where
they are needed.  Once an instruction has been sunk, its location
relative to other instructions is no longer consistent with the
original source code. In order to ensure correct single-stepping and
profiling, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.

Patch by Matthew Voss!

Differential Revision: https://reviews.llvm.org/D39933

llvm-svn: 318679
2017-11-20 18:42:17 +00:00
Mohammed Agabaria 115f68ea3e [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
Craig Topper 198f7d78d3 [X86] Regenerate a test with broadcast comments. NFC
llvm-svn: 318640
2017-11-20 08:15:04 +00:00
Sanjay Patel c0218923e1 [x86] add sqrt tests for partially-inline-libcalls (PR31455)
llvm-svn: 318630
2017-11-19 17:31:37 +00:00
Craig Topper bece74c694 [X86] Add test cases for rndscaless/sd intrinsics.
Also fix the memop in the ins for these instructions. Not sure what effect this has.

llvm-svn: 318624
2017-11-19 06:24:26 +00:00
Craig Topper 512e9e7f3f [X86] Improve load folding of scalar rcp28 and rsqrt28 instructions using sse_load_f32/f64.
llvm-svn: 318623
2017-11-19 05:42:54 +00:00
Craig Topper 9a94dfc457 [X86] Switch cannonlake to use the SkylakeServer scheduling model instead of Haswell.
Cannonlake comes after skylake and supports avx512 so this is probably a closer model for now.

llvm-svn: 318613
2017-11-19 01:25:30 +00:00
Craig Topper 81037f385e [X86] Add skeleton support for icelake CPU.
There are several patches out for review right now to implement Icelake features. This adds a CPU to collect them under.

llvm-svn: 318612
2017-11-19 01:12:00 +00:00
Simon Pilgrim 41fc45c4e6 [X86][AVX512VL] Add AVX512VL tests to the vselect packss tests.
PR34553 has gone, adding tests to ensure it doesn't come back.

vselect_packss_v16i64 still has some awful codegen on AVX512 targets....

llvm-svn: 318599
2017-11-18 19:47:59 +00:00
Craig Topper 65b8a2c9e1 [X86] Add another gather test with v8i8 sign extended indices.
This requires the indices to be legalized and sign extended.

llvm-svn: 318597
2017-11-18 19:25:35 +00:00
Sanjay Patel c0d1b2ee2e [x86] add tests for unnecessary shuffling; NFC
llvm-svn: 318592
2017-11-18 16:25:38 +00:00
Martin Storsjo 94b59240e2 [X86] Output cfi directives for saved XMM registers even if no GPRs are saved
This makes sure that functions that only clobber xmm registers
(on win64) also get the right cfi directives, if dwarf exceptions
are enabled.

Differential Revision: https://reviews.llvm.org/D40191

llvm-svn: 318591
2017-11-18 06:23:48 +00:00
Simon Pilgrim 7834009726 [X86] Merge scheduling tests for SHLD/SHRD
Reduces spsce used and makes it easier to compare the 2 values for the equivalent instructions

llvm-svn: 318541
2017-11-17 18:35:49 +00:00
Craig Topper 089082378f [X86] Add DAG combine to remove sext i32->i64 from gather/scatter instructions.
Only do this pre-legalize in case we're using the sign extend to legalize for KNL.

This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends.

There's more work that could be done here particularly to fix the i8->i64 test case that experienced split.

llvm-svn: 318468
2017-11-16 23:09:06 +00:00
Craig Topper 1120bb9f59 [X86] Add gather test with index sign extended from i8 type.
Previously SelectionDAGBuilder would remove this sign extend leading to a failure during isel.

The codegen here isn't very nice as we ended up triggering a split.

llvm-svn: 318467
2017-11-16 23:09:03 +00:00
Craig Topper 242374e219 [X86] Don't remove sign extend of gather/scatter indices during SelectionDAGBuilder.
The sign extend might be from an i16 or i8 type and was inserted by InstCombine to match the pointer width. X86 gather legalization isn't currently detecting this to reinsert a sign extend to make things legal.

It's a bit weird for the SelectionDAGBuilder to do this kind of optimization in the first place. With this removed we can at least lean on InstCombine somewhat to ensure the index is i32 or i64.

I'll work on trying to recover some of the test cases by removing sign extends in the backend when its safe to do so with an understanding of the current legalizer capabilities.

This should fix PR30690.

llvm-svn: 318466
2017-11-16 23:08:57 +00:00
Craig Topper e85ff4f732 [X86] Pre-truncate gather/scatter indices that have element sizes larger than 64-bits before Legalize.
The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization.

This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers.

llvm-svn: 318452
2017-11-16 20:23:22 +00:00
Simon Pilgrim e8e6acdac9 [X86] Add scheduling tests for SHLD/SHRD
llvm-svn: 318402
2017-11-16 14:13:48 +00:00
Craig Topper 46a5d58b8c [X86] Update TTI to report that v1iX/v1fX types aren't legal for masked gather/scatter/load/store.
The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG.

llvm-svn: 318380
2017-11-16 06:02:05 +00:00
Simon Pilgrim 56415772d6 [X86] Add CBW/CDQ/CDQE/CQO/CWD/CWDE to WriteALU schedule class
Some CPUs are already overriding these sign extension instructions but we should be able to use the WriteALU schedule class by default.

Differential Revision: https://reviews.llvm.org/D39899

llvm-svn: 318308
2017-11-15 17:11:24 +00:00
Hans Wennborg 1403100b6b Fix switch-lower-peel-top-case.ll isel pass is not registered error
The test was doing -stop-after=isel, but that pass is actually the
AMDGPUDAGToDAGISel pass, which might not be built when targeting x86_64.
This changes the test to -stop-after=expand-isel-pseudos instead.

Follow-up to r318202.

llvm-svn: 318220
2017-11-14 23:30:28 +00:00
Aditya Nandakumar e6201c8724 [GISel]: Rework legalization algorithm for better elimination of
artifacts along with DCE

Legalization Artifacts are all those insts that are there to make the
type system happy. Currently, the target needs to say all combinations
of extends and truncs are legal and there's no way of verifying that
post legalization, we only have *truly* legal instructions. This patch
changes roughly the legalization algorithm to process all illegal insts
at one go, and then process all truncs/extends that were added to
satisfy the type constraints separately trying to combine trivial cases
until they converge. This has the added benefit that, the target
legalizerinfo can only say which truncs and extends are okay and the
artifact combiner would combine away other exts and truncs.

Updated legalization algorithm to roughly the following pseudo code.

WorkList Insts, Artifacts;
collect_all_insts_and_artifacts(Insts, Artifacts);

do {
  for (Inst in Insts)
         legalizeInstrStep(Inst, Insts, Artifacts);
  for (Artifact in Artifacts)
         tryCombineArtifact(Artifact, Insts, Artifacts);
} while(!Insts.empty());

Also, wrote a simple wrapper equivalent to SetVector, except for
erasing, it avoids moving all elements over by one and instead just
nulls them out.

llvm-svn: 318210
2017-11-14 22:42:19 +00:00
Rong Xu dc07ae259e [CodeGen] Fix the test case added in r318202
Add the -mtriple option to filter some platforms.

llvm-svn: 318206
2017-11-14 22:08:37 +00:00
Rong Xu 3573d8da36 [CodeGen] Peel off the dominant case in switch statement in lowering
This patch peels off the top case in switch statement into a branch if the
probability exceeds a threshold. This will help the branch prediction and
avoids the extra compares when lowering into chain of branches.

Differential Revision: http://reviews.llvm.org/D39262

llvm-svn: 318202
2017-11-14 21:44:09 +00:00
Hans Wennborg e1ecd61b98 Rename CountingFunctionInserter and use for both mcount and cygprofile calls, before and after inlining
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.

This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)

LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.

Differential Revision: https://reviews.llvm.org/D39287

llvm-svn: 318195
2017-11-14 21:09:45 +00:00
Easwaran Raman 0d55b55bb6 [CodeGenPrepare] Disable div bypass when working set size is huge.
Summary:
Bypass of slow divs based on operand values is currently disabled for
-Os. Do the same when profile summary is available and the working set
size of the application is huge. This is similar to how loop peeling is
guarded by hasHugeWorkingSetSize. In the div bypass case, the generated
extra code (and the extra branch) tendss to outweigh the benefits of the
bypass. This results in noticeable performance improvement on an
internal application.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39992

llvm-svn: 318179
2017-11-14 19:31:51 +00:00
Simon Pilgrim 600174e740 [X86][AVX] Add scheduling test for vmovntdq 256-bit store
Needs to use inline asm as domain will otherwise be changed to float (vmovntps)

llvm-svn: 318151
2017-11-14 14:03:29 +00:00
Craig Topper c314f461dd [X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter address
If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address.

This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase.

We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it.

Differential Revision: https://reviews.llvm.org/D39927

llvm-svn: 318057
2017-11-13 17:53:59 +00:00
Omer Paparo Bivas 4c679e1435 Inserting a base test for X86 performance nops
Change-Id: I69da08b617d7fae8024c5aee04720eb465f39b81
llvm-svn: 318041
2017-11-13 15:02:39 +00:00
Uriel Korach 2aa707bdaa [X86] test/testn intrinsics lowering to IR. llvm part.
Remove builtins from llvm and add AutoUpgrade support.
Also add fast-isel tests for the TEST and TESTN instructions.

Differential Revision: https://reviews.llvm.org/D38736

llvm-svn: 318036
2017-11-13 12:51:18 +00:00
Jina Nahias 9a7f9f123c [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38671

Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661
llvm-svn: 318026
2017-11-13 09:16:39 +00:00
Gadi Haber c9f2300652 [X86][SKX] Adding scheduling info of non-intrinsic + commutable SKX opcodes.
Updated the scheduling information of the SKX subtarget  in the file X86SchedSkylakeServer.td under lib/Target/X86 to:
1. add regular opcodes in addition to the suffixed "_Int" opcodes
2. add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS
    instructions that are equivalent to their counterparts without the 'C' as they are part of a hack to
    make floating point min/max commutable under fast math.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39833

Change-Id: Ie13702a5ce1b1a08af91ca637a52b6962881e7d6
llvm-svn: 318024
2017-11-13 08:42:07 +00:00
Craig Topper 75d71540f8 [X86] Use sse_load_f32/f64 to improve load folding of scalar vfscalefss/sd, vrcp14ss/sd, rsqrt14ss/sd instructions.
llvm-svn: 318022
2017-11-13 08:07:33 +00:00
Craig Topper c748455e51 [X86] Regenerate test. NFC
llvm-svn: 318021
2017-11-13 08:07:31 +00:00
Craig Topper ca8abedb2a [X86] Use sse_load_f32/f64 to improve load folding for scalar VFPCLASS intrinsics.
llvm-svn: 318019
2017-11-13 06:46:48 +00:00
Craig Topper bf328f263e [X86] Add tests for missed opportunities to fold a 128-bit vector load into vfpclassss and vpfpclasssd.
llvm-svn: 318018
2017-11-13 06:46:46 +00:00
Craig Topper d4f6094091 [X86] Fix SQRTSS/SQRTSD/RCPSS/RCPSD intrinsics to use sse_load_f32/sse_load_f64 to increase load folding opportunities.
llvm-svn: 318016
2017-11-13 05:25:24 +00:00
Craig Topper 24389c6746 [X86] Add tests for full vector loads to fold-load-unops.ll.
We should be able to fold a full vector load into a scalar intrinsic. Since it's legal to narrow a load.

llvm-svn: 318015
2017-11-13 05:25:23 +00:00
Craig Topper a95a1fd42d [X86] Regenerate fold-load-unops.ll and add and avx512f command line.
llvm-svn: 318014
2017-11-13 05:25:21 +00:00
Craig Topper deee24b83c [X86] Use sse_load_f32/f64 in patterns for the memory forms of VRNDSCALESS/SD.
llvm-svn: 318009
2017-11-13 02:03:01 +00:00
Craig Topper 63157c4784 [X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round intrinsics.
The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0.

This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space.

We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used.

I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches.

llvm-svn: 318008
2017-11-13 02:03:00 +00:00