Commit Graph

296 Commits

Author SHA1 Message Date
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Craig Topper d710adac2d [X86] Disable CLWB for Cannon Lake
Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX anymore.

Instead, enumerate all SKX features with the exception of CLWB.

Patch by Gabor Buella

Differential Revision: https://reviews.llvm.org/D43380

llvm-svn: 325654
2018-02-21 00:15:48 +00:00
Craig Topper a8f87a36f1 [X86] Add FeaturePOPCNTFalseDeps to skylake server CPU to match skylake client.
llvm-svn: 323700
2018-01-29 21:56:48 +00:00
Simon Pilgrim 02bdac53e7 [X86] Emit 11-byte or 15-byte NOPs on recent AMD targets, else default to 10-byte NOPs (PR22965)
We currently emit up to 15-byte NOPs on all targets (apart from Silvermont), which stalls performance on some targets with decoders that struggle with 2 or 3 more '66' prefixes.

This patch flags recent AMD targets (btver1/znver1) to still emit 15-byte NOPs and bdver* targets to emit 11-byte NOPs. All other targets now emit 10-byte NOPs apart from SilverMont CPUs which still emit 7-byte NOPS.

Differential Revision: https://reviews.llvm.org/D42616

llvm-svn: 323693
2018-01-29 21:24:31 +00:00
Craig Topper b207dd6870 [X86] Add 'rdrnd' feature to silvermont to match recent gcc bug fix.
gcc recently fixed this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546

llvm-svn: 323550
2018-01-26 19:34:14 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Marina Yatsina 77a21dbad4 Break false dependencies for POPCNT, LZCNT, TZCNT
Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency.
Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions.
Update affected tests.

This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869

This is the final of multiple patches that fix this bugzilla.
Most of the patches are intended at refactoring the existent code.

Reviews of the refactoring done to enable this change:
https://reviews.llvm.org/D40330
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333

Differential Revision: https://reviews.llvm.org/D40334

Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d
llvm-svn: 323096
2018-01-22 10:07:01 +00:00
Craig Topper 0d797a34d8 [X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface.
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.

I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.

This has been split from D41895 with the remainder in a follow up commit.

llvm-svn: 323015
2018-01-20 00:26:08 +00:00
Craig Topper 84b26b90d1 [X86] Add intrinsic support for the RDPID instruction
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.

Differential Revision: https://reviews.llvm.org/D42205

llvm-svn: 322910
2018-01-18 23:52:31 +00:00
Craig Topper 505f38a059 [X86] Move HasNOPL to a subtarget feature bit. Plumb MCSubtargetInfo through the MCAsmBackend constructor
After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.

Differential Revision: https://reviews.llvm.org/D41721

llvm-svn: 322227
2018-01-10 22:07:16 +00:00
Craig Topper 55cfa89f20 [X86] Add CLWB to icelake.
Per Table 1-1 in October 2017 edition of Intel® Architecture Instruction Set Extensions and Future Features

llvm-svn: 321501
2017-12-27 22:04:04 +00:00
Craig Topper 67885f5d58 [X86] Enable PRFCHW feature on KNL/KNM and all CPUs inherited from Broadwell.
llvm-svn: 321336
2017-12-22 02:41:12 +00:00
Craig Topper e268598dd3 [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions.
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.

The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.

Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.

I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.

llvm-svn: 321335
2017-12-22 02:30:30 +00:00
Simon Pilgrim fd5df639a3 [X86][SSE] Add cpu feature for aggressive combining to variable shuffles
As mentioned in D38318 and D40865, modern Intel processors prefer to combine multiple shuffles to a variable shuffle mask (PSHUFB/VPERMPS etc.) instead of having multiple stage 'fixed' shuffles which put more pressure on Port 5 (at the expense of extra shuffle mask loads).

This patch provides a FeatureFastVariableShuffle target flag for Haswell+ CPUs that prefers combining 2 or more fixed shuffles to a single variable shuffle (default is 3 shuffles).

The long term aim is to drive more of this from schedule data (probably via the MC) but we're not close to being ready for that yet.

Differential Revision: https://reviews.llvm.org/D41323

llvm-svn: 321074
2017-12-19 13:16:43 +00:00
Craig Topper 57c2815cbe [X86] Adjust tablegen includes so we can use Instructions in scheduler models instead of just instregexs.
This separates the CPU specific scheduler model includes to occur after the instructions. Moves the instruction includes between the basic scheduler information and the CPU specific scheduler models.

llvm-svn: 320313
2017-12-10 17:42:36 +00:00
Oren Ben Simhon fa582b075c Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)
Shadow stack solution introduces a new stack for return addresses only.
The HW has a Shadow Stack Pointer (SSP) that points to the next return address.
If we return to a different address, an exception is triggered.
The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP).
The intrinsics are mapped to new instruction set that implements CET mechanism.

The patch also includes initial infrastructure support for IBT.

For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40223

Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4
llvm-svn: 318996
2017-11-26 13:02:45 +00:00
Coby Tayree d8b17bedfa [x86][icelake]GFNI
galois field arithmetic (GF(2^8)) insns:
gf2p8affineinvqb
gf2p8affineqb
gf2p8mulb
Differential Revision: https://reviews.llvm.org/D40373

llvm-svn: 318993
2017-11-26 09:36:41 +00:00
Craig Topper ea37e201ec [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions.
Summary:
This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior.

Test command lines have been added for these two cases.

Reviewers: magabari, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D40282

llvm-svn: 318983
2017-11-25 18:09:37 +00:00
Craig Topper a890570b15 [X86] Add BITALG, VAES, VBMI2, VNNI, VPCLMULQDQ, and VPOPCNTDQ instructions to icelake CPU.
This is based on table 1-1 of the October 2017 revision of Intel® Architecture Instruction Set Extensions and Future Features Programming Reference

llvm-svn: 318799
2017-11-21 21:05:18 +00:00
Coby Tayree 5c7fe5df53 [x86][icelake]BITALG
vpopcnt{b,w}
Differential Revision: https://reviews.llvm.org/D40213

llvm-svn: 318748
2017-11-21 10:32:42 +00:00
Coby Tayree 3880f2a363 [x86][icelake]VNNI
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208

llvm-svn: 318746
2017-11-21 10:04:28 +00:00
Coby Tayree 71e37cc9ff [x86][icelake]vbmi2
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206

llvm-svn: 318745
2017-11-21 09:48:44 +00:00
Coby Tayree 7ca5e58736 [x86][icelake]vpclmulqdq introduction
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101

llvm-svn: 318741
2017-11-21 09:30:33 +00:00
Coby Tayree 2a1c02fcbc [x86][icelake]VAES introduction
an icelake promotion of AES
Differential Revision: https://reviews.llvm.org/D40078

llvm-svn: 318740
2017-11-21 09:11:41 +00:00
Craig Topper 9a94dfc457 [X86] Switch cannonlake to use the SkylakeServer scheduling model instead of Haswell.
Cannonlake comes after skylake and supports avx512 so this is probably a closer model for now.

llvm-svn: 318613
2017-11-19 01:25:30 +00:00
Craig Topper 81037f385e [X86] Add skeleton support for icelake CPU.
There are several patches out for review right now to implement Icelake features. This adds a CPU to collect them under.

llvm-svn: 318612
2017-11-19 01:12:00 +00:00
Craig Topper 428a4e6374 [X86] Make FeatureAVX512 imply FeatureF16C.
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.

Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.

All known CPUs with AVX512 have F16C so this should safe for now.

llvm-svn: 317521
2017-11-06 22:49:04 +00:00
Craig Topper cb6c38612e [X86] Make FeatureAVX512 imply FeatureFMA.
Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA || AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag.

EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL.

So this makes AVX512 imply FeatureFMA to make our current behavior explicit.

All known CPUs that support AVX512 have VEX FMA instructions.

llvm-svn: 317520
2017-11-06 22:49:01 +00:00
Craig Topper 3837322a6b [X86] Use foreach in X86.td to combine some of the CPU names that are obviously aliases. NFC
llvm-svn: 317134
2017-11-01 22:15:49 +00:00
Craig Topper 7a754c4622 [X86] Add CMOV feature to 'i686' processor, making it a proper alias of pentiumpro which I believe it should be.
This is consistent with current gcc behavior.

llvm-svn: 317133
2017-11-01 22:15:40 +00:00
Craig Topper 6fae2eedf3 [X86] Add avx512vpopcntdq to Knights Mill
As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017.

llvm-svn: 316592
2017-10-25 17:10:32 +00:00
Gadi Haber 323f2e1715 [X86][Broadwell] Added the instruction scheduling information for the Broadwell CPU.
Adding the scheduling information for the Browadwell (BDW) CPU target.

This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target.
We used the scheduling information retrieved from the Broadwell architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction.

The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175.

Performance fluctuations may be expected due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39054

Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01
llvm-svn: 316492
2017-10-24 20:19:47 +00:00
Craig Topper 2738117326 [X86] Remove the SlowBTMem feature flag entirely
Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments.

llvm-svn: 315862
2017-10-15 16:57:33 +00:00
Craig Topper a1f9c9dd8b [X86] Add FeatureSlowBTMem to Haswell, Broadwell, Skylake, Cannonlake, and Knights Landing CPUs.
Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too.

Reviewers: RKSimon, zvi, gadi.haber

Reviewed By: gadi.haber

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38890

llvm-svn: 315859
2017-10-15 16:41:15 +00:00
Craig Topper 5d692917f4 [X86] Add initial skeleton support for knm cpu
This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it.

Differential Revision: https://reviews.llvm.org/D38811

llvm-svn: 315722
2017-10-13 18:10:17 +00:00
Craig Topper 5805fb3dfc [X86] Fix some inconsistent formatting in the processor feature lists.
llvm-svn: 315696
2017-10-13 16:06:06 +00:00
Craig Topper 54541c4675 [X86] Add ProcIntelBDW to BroadwellProc class not BDWFeatures class.
This isn't a property we want inherited.

llvm-svn: 315695
2017-10-13 16:04:08 +00:00
Gadi Haber 684944b822 [X86][SKX] Adding the scheduling information for the SKX target.
Adding the scheduling information for the SkylakeServer (SKX) target.

This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.

The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443

Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
2017-10-08 12:52:54 +00:00
Michael Zuckerman ac1d20dea7 Adding missing feature to goldmont.
Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4
llvm-svn: 314103
2017-09-25 13:45:31 +00:00
Gadi Haber 6f8fbf4b86 [X86][Skylake] Adding the scheduling information for the SkylakeClient target
This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena
Differential Revision: https://reviews.llvm.org/D37294

llvm-svn: 313613
2017-09-19 06:19:27 +00:00
Mohammed Agabaria e9aebf26af [X86] Adding X86 Processor Families
Adding x86 Processor families to initialize several uArch properties (based on the family)
This patch shows how gather cost can be initialized based on the proc. family

Differential Revision: https://reviews.llvm.org/D35348

llvm-svn: 313132
2017-09-13 09:00:27 +00:00
Craig Topper ef1f71669e [X86] Apply SlowIncDec feature to Sandybridge/Ivybridge CPUs as well
Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.

Differential Revision: https://reviews.llvm.org/D37250

llvm-svn: 312099
2017-08-30 05:00:35 +00:00
Craig Topper 641e2af9e8 [X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flag
Summary:
Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge".

This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion.

This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX)

This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature.

Reviewers: spatel, chandlerc, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37280

llvm-svn: 312097
2017-08-30 04:34:48 +00:00
Craig Topper 62c47a2aa5 Mark Knights Landing as having slow two memory operand instructions
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.

Patch by David Zarzycki.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37224

llvm-svn: 311979
2017-08-29 05:14:27 +00:00
Chandler Carruth 5b491808f5 [x86] Back out one aspect of r311318: don't generically set
FeatureSlowUAMem32.

The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.

The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.

The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/

It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.

llvm-svn: 311740
2017-08-25 00:56:05 +00:00
Chandler Carruth 98c51cbee1 [x86] Teach the "generic" x86 CPU to avoid patterns that are slow on
widely used processors.

This occured to me when I saw that we were generating 'inc' and 'dec'
when for Haswell and newer we shouldn't. However, there were a few "X is
slow" things that we should probably just set.

I've avoided any of the "X is fast" features because most of those would
be pretty serious regressions on processors where X isn't actually fast.
The slow things are likely to be negligible costs on processors where
these aren't slow and a significant win when they are slow.

In retrospect this seems somewhat obvious. Not sure why we didn't do
this a long time ago.

Differential Revision: https://reviews.llvm.org/D36947

llvm-svn: 311318
2017-08-21 08:45:22 +00:00
Craig Topper 106b5b6856 AMD znver1 Initial Scheduler model
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
        a. Instructions types
        b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.

Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.

Reviewers: craig.topper, RKSimon

Subscribers: javed.absar, shivaram, ddibyend, vprasad

Differential Revision: https://reviews.llvm.org/D35293

llvm-svn: 308411
2017-07-19 02:45:14 +00:00
Craig Topper a4c5caf67a [X86] Add RDRAND feature to GLM CPU
Summary: I believe this should be supported on GLM since RDSEED is.

Reviewers: m_zuckerman, zvi, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34828

llvm-svn: 307060
2017-07-04 05:33:19 +00:00
Michael Zuckerman 4bcb9c3349 [LLVM][X86][Goldmont] Adding new target-cpu: Goldmont
[LLVM SIDE]
Connecting the GoldMont processor to his feature.

Reviewers: 
1. igorb
2. zvi
3. delena
4. RKSimon
5. craig.topper        

Differential Revision: https://reviews.llvm.org/D34504

llvm-svn: 306658
2017-06-29 10:00:33 +00:00
Oren Ben Simhon 7bf27f03f2 [X86] Adding vpopcntd and vpopcntq instructions
AVX512_VPOPCNTDQ is a new feature set that was published by Intel.
The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).

Differential Revision: https://reviews.llvm.org/D33169

llvm-svn: 303858
2017-05-25 13:45:23 +00:00