Commit Graph

15167 Commits

Author SHA1 Message Date
Andrew V. Tischenko e255526d0b Added cost of ZEROALL and ZEROUPPER instrs in btver2 cpu.
Differential Revision https://reviews.llvm.org/D35834

llvm-svn: 309269
2017-07-27 13:12:08 +00:00
Simon Pilgrim afc1ac2735 [X86] Tidyup MaskedLoad/Store mask creation. NFCI.
Assign all concat elements to zero and then just replace the first element, instead of setting them all to null and copying everything in.

llvm-svn: 309261
2017-07-27 10:29:04 +00:00
Peter Collingbourne 081ffe2ff2 Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This was a use-after-free waiting to happen.

llvm-svn: 309159
2017-07-26 19:15:29 +00:00
Zvi Rackover 092f199188 DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
Summary:
Adding support for combining power2-strided build_vector's where the
first build_vectori's operand is extracted from a non-zero index.

Example:

 v4i32 build_vector((extract_elt V, 1),
                    (extract_elt V, 3),
                    (extract_elt V, 5),
                    (extract_elt V, 7))
 -->
 v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64)

Reviewers: delena, RKSimon, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35700

llvm-svn: 309108
2017-07-26 12:57:03 +00:00
Michael Zuckerman c1918ad571 [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.

LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).

The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:

c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31

And these need to be transposed/interleaved and stored like so:

c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....

Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer

Differential Revision: https://reviews.llvm.org/D34601

llvm-svn: 309086
2017-07-26 08:10:14 +00:00
Zvi Rackover 1b73682243 TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI.
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.

This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.

llvm-svn: 309085
2017-07-26 08:06:58 +00:00
Michael Zuckerman 60bc7e0f0a [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess part1.
splitting patch D34601 into two part. This part changes the location of two functions. 
The second part will be based on that patch. This was requested by @RKSimon.

Reviewers:
1. dorit	
2. Farhana	
3. RKSimon	
4. guyblank	
5. DavidKreitzer

llvm-svn: 309084
2017-07-26 07:45:02 +00:00
Craig Topper 050c9c8f83 [X86] Prevent selecting masked aligned load instructions if the load should be non-temporal
Summary: The aligned load predicates don't  suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D35712

llvm-svn: 309079
2017-07-26 04:31:04 +00:00
Eric Christopher 97ae58686f Update the comments on default subtargets based on feedback.
llvm-svn: 309041
2017-07-25 22:21:08 +00:00
Eric Christopher adfe5368ee Revert "This patch enables the usage of constant Enum identifiers within Microsoft style inline assembly statements."
This reverts commit r308966.

llvm-svn: 309005
2017-07-25 19:22:09 +00:00
Simon Pilgrim 18b97f78fe [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).

Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).

This patch should be considered for the 5.0.0 release branch as well

Differential Revision: https://reviews.llvm.org/D35830

llvm-svn: 308986
2017-07-25 17:04:37 +00:00
Andrew V. Tischenko 32e9b1ad0b X86 Asm uses assertions instead of proper diagnostic. This patch fixes that.
Differential Revision: https://reviews.llvm.org/D35115

llvm-svn: 308972
2017-07-25 13:05:12 +00:00
Matan Haroush 2f21017be2 This patch enables the usage of constant Enum identifiers within Microsoft style inline assembly statements.
Differential Revision:
https://reviews.llvm.org/D33277
https://reviews.llvm.org/D33278

llvm-svn: 308966
2017-07-25 10:44:09 +00:00
Reid Kleckner c990b5d916 Revert "[X86][InlineAsm][Ms Compatibility]Prefer variable name over a register when the two collides"
This reverts r308867 and r308866.

It broke the sanitizer-windows buildbot on C++ code similar to the
following:

  namespace cl { }
  void f() {
    __asm {
      mov al, cl
    }
  }

t.cpp(4,13):  error: unexpected namespace name 'cl': expected expression
    mov al, cl
            ^

In this case, MSVC parses 'cl' as a register, not a namespace.

llvm-svn: 308926
2017-07-24 20:48:15 +00:00
Ayman Musa b16ce777e3 [X86][AVX512] Add patterns for masked AVX512 floating point compare instructions that were missing.
patterns were missed by D33188. Adding for completion.
+Updating test.

Differential Revesion: https://reviews.llvm.org/D35179

llvm-svn: 308868
2017-07-24 08:10:32 +00:00
Coby Tayree c48388d3d3 [X86][InlineAsm][Ms Compatibility]Prefer variable name over a register when the two collides
On MS-style, the following snippet:

int eax;
__asm mov eax, ebx

should yield loading of ebx, into the location pointed by the variable eax

This patch sees to it.

Currently, a reg-to-reg move would have been invoked.

clang: D34740

Differential Revision: https://reviews.llvm.org/D34739

llvm-svn: 308866
2017-07-24 07:04:55 +00:00
Petr Hosek 710479cede [CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos optimization
Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D35748

llvm-svn: 308854
2017-07-23 22:30:00 +00:00
Craig Topper 07a7d56144 [X86] Add some hasSideEffects=0 flags.
llvm-svn: 308835
2017-07-23 03:59:39 +00:00
Craig Topper 6912d7faa3 [X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful complexity adjustment to keep shift by immediate using the legacy instructions.
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.

llvm-svn: 308834
2017-07-23 03:59:37 +00:00
Craig Topper abfe380f9a [X86] Add nopq instruction which is a rex encoded version of nopl for gas compatibility.
llvm-svn: 308818
2017-07-22 01:30:53 +00:00
Craig Topper e88aef4b5f [X86] Add register form of NOPL and NOPW for assembler/disassembler.
Fixes PR32805.

llvm-svn: 308817
2017-07-22 01:30:51 +00:00
Farhana Aleen e4a89a6462 X86InterleaveAccess: A fix for bug33826
Reviewers: DavidKreitzer

Differential Revision: https://reviews.llvm.org/D35638

llvm-svn: 308784
2017-07-21 21:35:00 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Simon Pilgrim 32c377a1cf [X86][SSE] Add pre-AVX2 support for (i32 bitcast(v32i1)) -> 2xMOVMSK
Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction.

This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results.

In future we could probably generalize this to handle more cases.

Differential Revision: https://reviews.llvm.org/D35303

llvm-svn: 308723
2017-07-21 09:58:50 +00:00
Craig Topper 31140ade70 [AVX-512] Fix a bug that prevented some non-temporal loads from using the movntdqa instruction.
The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits.

llvm-svn: 308702
2017-07-21 00:40:42 +00:00
Craig Topper 27c12e088e [X86] Allow masks with more than 6 bits set on the x << (y & mask) optimization for the 64-bit memory shifts.
llvm-svn: 308657
2017-07-20 19:29:58 +00:00
Craig Topper 33225ef314 [X86] Use SARX/SHLX/SHLX instructions for (shift x (and y, (BitWidth-1)))
Fixes PR33841.

llvm-svn: 308591
2017-07-20 06:19:55 +00:00
Davide Italiano 5fc5d0a406 [X86] Don't try to scale down if that exceeds the bitwidth.
Fixes the crash reported in PR33844.

llvm-svn: 308503
2017-07-19 18:09:46 +00:00
Simon Pilgrim e5c7925c5e [X86][XOP] Use default AVX2 lowering for v4i64 ashr by splat constants
XOP shifts only support 128-bit vectors, so we were ending up with less optimal codegen requiring constants

llvm-svn: 308430
2017-07-19 10:29:31 +00:00
Craig Topper 106b5b6856 AMD znver1 Initial Scheduler model
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
        a. Instructions types
        b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.

Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.

Reviewers: craig.topper, RKSimon

Subscribers: javed.absar, shivaram, ddibyend, vprasad

Differential Revision: https://reviews.llvm.org/D35293

llvm-svn: 308411
2017-07-19 02:45:14 +00:00
Simon Pilgrim 483927aefb [x86, CGP] increase memcmp() expansion up to 4 load pairs
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.

Notes:

    Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.

    There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.

    We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:

https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.

Committed on behalf of Sanjay Patel

Differential Revision: https://reviews.llvm.org/D35067

llvm-svn: 308322
2017-07-18 15:55:30 +00:00
Craig Topper f54a500101 [X86] Prevent an assertion failure if a gather intrinsic is passed a non-constant scale value.
This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure.

Fixes PR33772.

llvm-svn: 308267
2017-07-18 06:49:23 +00:00
Martin Storsjo 2f24e93481 [AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well
Rename the enum value from X86_64_Win64 to plain Win64.

The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.

Differential Revision: https://reviews.llvm.org/D34474

llvm-svn: 308208
2017-07-17 20:05:19 +00:00
Simon Pilgrim 1cbe8c2ca5 [X86][AVX512] Add lowering of vXi32/vXi64 ISD::ROTL/ISD::ROTR
Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate

Differential Revision: https://reviews.llvm.org/D35463

llvm-svn: 308177
2017-07-17 14:11:30 +00:00
Simon Pilgrim 64fff14bde Strip trailing whitespace. NFCI
llvm-svn: 308143
2017-07-16 18:37:23 +00:00
Amjad Aboud 4563c062b1 [X86] X86::CMOV to Branch heuristic based optimization.
LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst.
However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when:
1. Branch is well predicted
2. Condition operand is expensive, compared to True-value and the False-value operands

In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough.
This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic.

Differential Revision: https://reviews.llvm.org/D34769

llvm-svn: 308142
2017-07-16 17:39:56 +00:00
Simon Pilgrim 73ef87978f [X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model
llvm-svn: 308132
2017-07-16 12:06:06 +00:00
Hiroshi Inoue 7f46baff2c fix typos in comments; NFC
llvm-svn: 308127
2017-07-16 08:11:56 +00:00
Eric Christopher 4e332c7cf1 Add a set of comments explaining why getSubtargetImpl() is deleted on these targets.
llvm-svn: 307999
2017-07-14 04:33:43 +00:00
Simon Pilgrim 5ee68bcc22 Fix whitespace indentation. NFCI.
llvm-svn: 307894
2017-07-13 09:36:04 +00:00
Hiroshi Inoue e9dea6e613 fix typos in comments and error messges; NFC
llvm-svn: 307885
2017-07-13 06:48:39 +00:00
Sanjay Patel 4450e73b5e [x86] improve SBB optimizations for SETB/SETA with subtract
This is another step towards removing a combine that turns sext
into select of constants and preparing the backend for an IR
future where select is the canonical form.

Earlier commits in this area:
https://reviews.llvm.org/rL306040
https://reviews.llvm.org/rL306072
https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652)
https://reviews.llvm.org/rL307471

llvm-svn: 307821
2017-07-12 17:56:46 +00:00
Davide Italiano a63981aaa9 [X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats.
FastIsel can't handle them, so we would end up crashing during
register class selection.
Fixes PR26522.

Differential Revision:  https://reviews.llvm.org/D35272

llvm-svn: 307797
2017-07-12 15:26:06 +00:00
Rafael Espindola 1beb702ba2 Fully fix the movw/movt addend.
The issue is not if the value is pcrel. It is whether we have a
relocation or not.

If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.

llvm-svn: 307730
2017-07-11 23:18:25 +00:00
Konstantin Zhuravlyov bb80d3e1d3 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723

llvm-svn: 307722
2017-07-11 22:23:00 +00:00
Igor Breger 324d3791f8 [GlobalISel][X86] Use correct AND instructions.
AND8ri8 not supported in 64bit.

llvm-svn: 307630
2017-07-11 08:04:51 +00:00
Andrew V. Tischenko ae9d6db769 [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573).
The new version of the model is definitely faster.

Differential Revision:
https://reviews.llvm.org/D35198

llvm-svn: 307552
2017-07-10 16:36:03 +00:00
Gadi Haber f4d154c089 This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target.
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.

Please note that the patch extensively affects the X86 MC instr scheduling for SNB.

Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.

The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs

For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:

def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];

}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;

Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.

Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb

Differential Revision:  https://reviews.llvm.org/D35019#inline-304691

llvm-svn: 307529
2017-07-10 09:53:16 +00:00
Igor Breger d8b51e134e [GlobalISel][X86] Support G_LOAD/G_STORE i1.
Summary: Support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35178

llvm-svn: 307527
2017-07-10 09:26:09 +00:00
Igor Breger d48c5e4855 [GlobalISel][X86] extend G_ZEXT support.
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16,  i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D35177

llvm-svn: 307526
2017-07-10 09:07:34 +00:00
Simon Pilgrim 4050c77d33 [X86] Allow GHC calling convention to use YMM and ZMM registers
GHC 8.4 will know how to use YMM and ZMM registers for calls.

Submitted on behalf of @bgamari (Ben Gamari)

Differential Revision: https://reviews.llvm.org/D34854 

llvm-svn: 307504
2017-07-09 16:57:10 +00:00
Sanjay Patel 18ee908ca2 [x86] add SBB optimization for SETBE (ule) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. Selecting 0 or -1 
needs extra attention to produce the optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)

As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.

llvm-svn: 307471
2017-07-08 14:04:48 +00:00
Eric Christopher 8737f0650c Remove a variable that was only used in asserts and had a duplicate copy in something we did use anyhow.
llvm-svn: 307457
2017-07-08 01:03:29 +00:00
Sanjay Patel dd36f75733 [x86] add SBB optimization for SETAE (uge) condition code
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess 
with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and 
enhance the combining for the expected patterns from that. DAGCombiner already 
has the foundation to allow the transforms, so we just need to fill in the holes 
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072

Differential Revision: https://reviews.llvm.org/D34652

llvm-svn: 307404
2017-07-07 14:56:20 +00:00
Simon Pilgrim 8ae7e41bea Fix spelling in comments. NFCI.
llvm-svn: 307288
2017-07-06 18:17:07 +00:00
Simon Pilgrim 713600747e [X86][SSE4A] Add support for shuffle combining to INSERTQI.
llvm-svn: 307268
2017-07-06 15:34:17 +00:00
Simon Pilgrim 7b79fbd4ea [X86][SSE] combineX86ShuffleChain - merge duplicate creations of integer mask types
llvm-svn: 307257
2017-07-06 13:09:19 +00:00
Simon Pilgrim 77ad6d9bb2 [X86][SSE] combineX86ShuffleChain - merge duplicate 'Zeroable' element masks
llvm-svn: 307255
2017-07-06 12:40:10 +00:00
Simon Pilgrim cc0f785dca [X86][SSE4A] Add support for shuffle combining to EXTRQ.
llvm-svn: 307254
2017-07-06 12:22:58 +00:00
Simon Pilgrim 1dd0bd1949 [X86][SSE4A] Split EXTRQ/INSERTQ shuffle matching from lowering. NFCI.
First step toward supporting shuffle combining to EXTRQ/INSERTQ.

llvm-svn: 307250
2017-07-06 11:06:54 +00:00
Igor Breger 0c979d49eb [GlobalISel][X86] For now don't handle not trivial function arguments lowering.
llvm-svn: 307142
2017-07-05 11:40:35 +00:00
Igor Breger 9d5571a226 [GlobalISel][X86] Allow graceful fallback for struct/array argument/return value lowering. Going to support it in follow patch.
llvm-svn: 307125
2017-07-05 06:24:13 +00:00
Simon Pilgrim ac3e7f3f57 [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles
With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types

llvm-svn: 307099
2017-07-04 18:11:02 +00:00
Simon Pilgrim f809c5f11c Fix signed/unsigned comparison warnings
llvm-svn: 307098
2017-07-04 17:42:01 +00:00
Simon Pilgrim 9f0a0bd20b [X86][SSE4A] Generalized EXTRQI/INSERTQI shuffle decodes
The existing decodes only worked for v16i8 vectors, this adds support for any 128-bit vector

llvm-svn: 307095
2017-07-04 16:53:12 +00:00
Daniel Sanders 6ab0daade8 [globalisel][tablegen] Partially fix compile-time regressions by converting matcher to state-machine(s)
Summary:
Replace the matcher if-statements for each rule with a state-machine. This
significantly reduces compile time, memory allocations, and cumulative memory
allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is
recommitted.

The following patches will expand on this further to fully fix the regressions.

Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar

Reviewed By: ab

Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33758

llvm-svn: 307079
2017-07-04 14:35:06 +00:00
Craig Topper ad140cfb68 [X86] Add comment string for broadcast loads from the constant pool.
Summary:
When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool.

I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead.

Reviewers: spatel, RKSimon, zvi

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34923

llvm-svn: 307062
2017-07-04 05:46:11 +00:00
Craig Topper a4c5caf67a [X86] Add RDRAND feature to GLM CPU
Summary: I believe this should be supported on GLM since RDSEED is.

Reviewers: m_zuckerman, zvi, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34828

llvm-svn: 307060
2017-07-04 05:33:19 +00:00
Simon Pilgrim fa6e675267 [X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shuffles
llvm-svn: 307048
2017-07-03 20:58:16 +00:00
Zvi Rackover d7a1c334ce DAGCombine: Combine BUILD_VECTOR to TRUNCATE
Summary:
Add a combine for creating a truncate to replace a build_vector composed of extracts with
indices that form a stride-2^N series.

Example:
v8i32 V = ...

v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6))
-->
v4i32 truncate (bitcast V to v4i64)

Related discussion in llvm-dev about canonicalizing shuffles to
truncates in LLVM IR:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html.

Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena

Reviewed By: delena

Subscribers: guyblank, delena, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D34077

llvm-svn: 307036
2017-07-03 15:47:40 +00:00
Igor Breger 5c787ab346 [GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection.
llvm-svn: 307019
2017-07-03 11:06:54 +00:00
Hiroshi Inoue ddb34d84c9 fix trivial typos in comments; NFC
llvm-svn: 307004
2017-07-03 06:32:59 +00:00
Simon Pilgrim a9655ffb42 [X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP
Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back.

llvm-svn: 306990
2017-07-02 19:32:37 +00:00
Simon Pilgrim 8971b2904e [X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

llvm-svn: 306978
2017-07-02 14:16:25 +00:00
Simon Pilgrim 4cb5613c38 [X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts
We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive

The 32-bit shuffles are a bit tricky and will be dealt with in a later patch

llvm-svn: 306977
2017-07-02 13:19:10 +00:00
Mohammed Agabaria eb09a810e6 [X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch
this patch updates the cost of addq\subq (add\subtract of vectors of 64bits)
based on the performance numbers of SLM arch.

Differential Revision: https://reviews.llvm.org/D33983

llvm-svn: 306974
2017-07-02 12:16:15 +00:00
Igor Breger 717bd36c83 [GlobalISel][X86] Support G_GLOBAL_VALUE operation.
Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34738

Conflicts:
	test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir

llvm-svn: 306972
2017-07-02 08:58:29 +00:00
Igor Breger b186a69aa5 [GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection.
Summary:
Support vector type G_UNMERGE_VALUES selection.
For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer.

Reviewers: t.p.northover, qcolombet, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33665

llvm-svn: 306971
2017-07-02 08:15:49 +00:00
Hiroshi Inoue bb703e8960 fix trivial typos; NFC
suport -> support

llvm-svn: 306968
2017-07-02 03:24:54 +00:00
Quentin Colombet 8cf805ae89 [X86] Move GISel accessor initialization from TargetMachine to Subtarget.
NFC

llvm-svn: 306921
2017-07-01 00:45:50 +00:00
Eric Christopher b4fb256574 Make 0 argument getSubtargetImpl functions for the X86, AArch64, and PPC targets deleted so that no one is tempted to use them.
llvm-svn: 306864
2017-06-30 19:49:05 +00:00
Simon Pilgrim 724990ab64 [X86][SSE] Pulled common variables to top of matchUnaryPermuteVectorShuffle. NFCI.
llvm-svn: 306847
2017-06-30 18:00:14 +00:00
Daniel Jasper 559aa75382 Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"
I am 99% sure that this breaks the PPC ASAN build bot:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio

If it doesn't go back to green, we can recommit (and fix the original
commit message at the same time :) ).

llvm-svn: 306676
2017-06-29 13:58:24 +00:00
Igor Breger 0cddd34876 [GlobalISel][X86] Support vector type G_MERGE_VALUES selection.
Summary:
Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer.
Split from https://reviews.llvm.org/D33665

Reviewers: qcolombet, t.p.northover, zvi, guyblank

Reviewed By: guyblank

Subscribers: rovka, kristof.beyls, guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D33958

llvm-svn: 306665
2017-06-29 12:08:28 +00:00
Michael Zuckerman 4bcb9c3349 [LLVM][X86][Goldmont] Adding new target-cpu: Goldmont
[LLVM SIDE]
Connecting the GoldMont processor to his feature.

Reviewers: 
1. igorb
2. zvi
3. delena
4. RKSimon
5. craig.topper        

Differential Revision: https://reviews.llvm.org/D34504

llvm-svn: 306658
2017-06-29 10:00:33 +00:00
Rafael Espindola d926ea2ff7 Reuse existing variables. NFC.
llvm-svn: 306586
2017-06-28 19:26:37 +00:00
Rafael Espindola 96367a3d1e Fix PR33625.
We were failing to convert this expression to pcrel.

llvm-svn: 306573
2017-06-28 17:56:07 +00:00
Igor Breger d5b59cf914 [GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR
Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code.

Reviewers: zvi, guyblank, aymanmus, m_zuckerman

Reviewed By: aymanmus

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34605

llvm-svn: 306533
2017-06-28 11:39:04 +00:00
Michael Zuckerman f66840020c Reverting commit 306414 on behalf of @gadi.haber
llvm-svn: 306532
2017-06-28 11:23:31 +00:00
Petar Jovanovic 7b3a38ec30 [X86] Correct dwarf unwind information in function epilogue
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.

Majority of the changes in this patch:

1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.

These changes are target independent and described below.

Changed CFI instructions so that they:

1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal

Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.

Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D18046

llvm-svn: 306529
2017-06-28 10:21:17 +00:00
Coby Tayree 41a5b55f50 [X86][AsmParser][MS-compatability] Binary/Unary operators enhancements
Introducing MOD binary operator
https://msdn.microsoft.com/en-us/library/hha180wt.aspx

Enhancing unary operators NEG and NOT, to support more complex patterns

Differential Revision: https://reviews.llvm.org/D33876

llvm-svn: 306425
2017-06-27 16:58:27 +00:00
Gadi Haber 13759a7ed6 Updated and extended the information about each instruction in HSW and SNB to include the following data:
•static latency
•number of uOps from which the instructions consists
•all ports used by the instruction

Reviewers: 
 RKSimon 
 zvi  
aymanmus  
m_zuckerman 

Differential Revision: https://reviews.llvm.org/D33897
 

llvm-svn: 306414
2017-06-27 15:05:13 +00:00
Ayman Musa 721d97f7b8 Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371
[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions).

AVX512 compare instructions return v*i1 types.
In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type.
Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes.
The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class.

When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction.

Differential Revision: https://reviews.llvm.org/D33188

llvm-svn: 306402
2017-06-27 12:08:37 +00:00
Galina Kistanova 06a0e0e6a9 Fixed the warning introduced by r306289 to make ubuntu-gcc7.1-werror bot green.
llvm-svn: 306369
2017-06-27 06:58:57 +00:00
Tim Northover c2d5e6d637 AArch64: legalize G_EXTRACT operations.
This is the dual problem to legalizing G_INSERTs so most of the code and
testing was cribbed from there.

llvm-svn: 306328
2017-06-26 20:34:13 +00:00
Marina Yatsina f58dcb85d2 [inline asm] dot operator while using imm generates wrong ir + asm - llvm part
Inline asm dot operator while using imm generates wrong ir and asm

This also fixes bugzilla 32987:
https://bugs.llvm.org//show_bug.cgi?id=32987

The clang part of the review that contains the test can be found here:
https://reviews.llvm.org/D33040

commit on behald of zizhar

Differential Revision:
https://reviews.llvm.org/D33039

llvm-svn: 306300
2017-06-26 16:03:42 +00:00
Ahmed Bougacha 58a197414e [X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc.
The non-AVX-512 behavior was changed in r248266 to match N1778
(C bindings for IEEE-754 (2008)), which defined the four functions
to not raise the inexact exception ("rint" is still defined as raising
it).

Update the AVX-512 lowering of these functions to match that: it should
not be different.

llvm-svn: 306299
2017-06-26 16:00:24 +00:00
Sanjay Patel 15748d239e [x86] transform vector inc/dec to use -1 constant (PR33483)
Convert vector increment or decrement to sub/add with an all-ones constant:

add X, <1, 1...> --> sub X, <-1, -1...>
sub X, <1, 1...> --> add X, <-1, -1...>

The all-ones vector constant can be materialized using a pcmpeq instruction that is 
commonly recognized as an idiom (has no register dependency), so that's better than 
loading a splat 1 constant.

AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better
way to produce 512 one-bits.

The general advantages of this lowering are:
1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, 
   so in theory, this could be better for perf, but...

2. That seems unlikely to affect any OOO implementation, and I can't measure any real 
   perf difference from this transform on Haswell or Jaguar, but...

3. It doesn't look like it from the diffs, but this is an overall size win because we 
   eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting 
   a scalar load (which might itself be a bug), then we're replacing a scalar constant 
   load + broadcast with a single cheap op, so that should always be smaller/better too.

4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 
   and psub x, -1, so we should use that form for +1 too because we can. If there's some
   reason to favor a constant load on some CPU, let's make the reverse transform for all
   of these cases (either here in the DAG or in a later machine pass).

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=33483

Differential Revision: https://reviews.llvm.org/D34336

llvm-svn: 306289
2017-06-26 14:19:26 +00:00
Simon Pilgrim c338ba48fc [X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memops
The 'scalar' simd bitops were dropped a while ago

llvm-svn: 306248
2017-06-25 17:04:58 +00:00
Simon Pilgrim bed1fa1ac1 Strip trailing whitespace. NFCI.
llvm-svn: 306247
2017-06-25 16:57:46 +00:00