Commit Graph

1347 Commits

Author SHA1 Message Date
Andrew V. Tischenko ee2e3144ba Improved sched model for X86 BSWAP* instrs.
Differential Revision: https://reviews.llvm.org/D49477

llvm-svn: 337537
2018-07-20 09:39:14 +00:00
Craig Topper a8efe59a0a [X86] Prefer MOVSS/SD over BLEND under optsize in isel.
Previously we iseled to blend, commuted to another blend, and then commuted back to movss/movsd or blend depending on optsize. Now we do it directly.

llvm-svn: 336976
2018-07-13 06:25:31 +00:00
Roman Lebedev fa988853bd [X86][Basically NFC] Sched: split WriteBitScan into WriteBSF/WriteBSR.
Summary:
Motivation: {F6597954}

This only does the mechanical splitting, does not actually change
any numbers, as the tests added in previous revision show.

Reviewers: craig.topper, RKSimon, courbet

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48998

llvm-svn: 336511
2018-07-08 09:50:25 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Clement Courbet 7b9913fb9f [X86] Add sched class WriteLAHFSAHF and fix values.
Summary:
I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB.
Atom is from Agner and SLM is a guess.
I've left AMD processors alone.

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D48079

llvm-svn: 335097
2018-06-20 06:13:39 +00:00
Craig Topper 16fdde5e63 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

llvm-svn: 334922
2018-06-18 05:00:50 +00:00
Craig Topper c435632862 [X86] Hide POP16/32/64rmr and PUSH16/32/64rmr instructions from the assembly parser.
These all have a short form encoding that the assembler already prefers. Though that preference seems to only be based on order in the .td fie. Hiding the long form saves space in the table and prevents us from breaking the implicit order based priority.

llvm-svn: 334897
2018-06-16 23:25:48 +00:00
Craig Topper 5799e4df75 [X86] Add NotMemoryFoldable to more instructions.
These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form.

llvm-svn: 334479
2018-06-12 07:32:17 +00:00
Craig Topper 66572df76e [X86] Add NotMemoryFoldable to a bunch of instructions to suppress them from the autogenerated load folding table.
Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table.

A few are real opcode collisions where the memory and register forms are completely different instructions.

llvm-svn: 334474
2018-06-12 04:34:59 +00:00
Craig Topper 860562c915 [X86] Miscellaneous fixes to get the load folding table generator to work again.
llvm-svn: 334377
2018-06-10 21:48:24 +00:00
Roman Lebedev 488d28d4e5 [X86] Emit BZHI when mask is ~(-1 << nbits))
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.

This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.

Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Reviewers: craig.topper, spatel, RKSimon, javed.absar

Reviewed By: craig.topper

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D47453

llvm-svn: 334125
2018-06-06 19:38:16 +00:00
Craig Topper d04cc8e640 [X86] Rename vy512mem->vy512xmem and vz256xmem->vz256mem.
The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256.

As vy512mem uses a VR256X index it should have an x.
And vz256mem uses a VR512 index so it shouldn't have an x.

I admit these names kind of suck and are confusing.

llvm-svn: 334120
2018-06-06 19:15:12 +00:00
Simon Pilgrim 3d14158891 [X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042)
Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH).

Differential Revision: https://reviews.llvm.org/D47690

llvm-svn: 334083
2018-06-06 10:52:10 +00:00
Simon Pilgrim 79dae5ba2a [X86] Don't hardcode scheduler class
Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added.

llvm-svn: 333360
2018-05-27 14:54:18 +00:00
Gabor Buella d2f1ab1b10 [x86] invpcid LLVM intrinsic
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.

Reviewers: craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47141

llvm-svn: 333255
2018-05-25 06:32:05 +00:00
Petar Jovanovic c051000b83 [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Alexander Ivchenko 5c54742da4 [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.

Comes with a clang patch (D46881).

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46882

llvm-svn: 332705
2018-05-18 11:58:25 +00:00
Gabor Buella a832b22bae [X86] ptwrite intrinsic
Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D46539

llvm-svn: 331961
2018-05-10 07:26:05 +00:00
Gabor Buella 4a02bf945e [x86] Introduce the enclv instruction
Summary:
and use the -msgx flag as a requirement
for the SGX instructions.

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46436

llvm-svn: 331742
2018-05-08 07:11:05 +00:00
Gabor Buella 2b5e96004b [x86] Introduce the pconfig instruction
Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D46430

llvm-svn: 331739
2018-05-08 06:47:36 +00:00
Gabor Buella c8ded04e85 [X86] movdiri and movdir64b instructions
Reviewers: spatel, craig.topper, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D45983

llvm-svn: 331248
2018-05-01 10:01:16 +00:00
Craig Topper 33dc01d105 [X86] Remove 'opaque ptr' from the intel syntax parser and printer.
Previously for instructions like fxsave we would print "opaque ptr" as part of the memory operand. Now we print nothing.

We also no longer accept "opaque ptr" in the parser. We still accept any size to be specified for these instructions, but we may want to consider only parsing when no explicit size is specified. This what gas does.

llvm-svn: 331243
2018-05-01 04:42:00 +00:00
Craig Topper cda7690976 [X86] Remove some InstAliases aren't needed because a MnemonicAlias makes them unreachable.
llvm-svn: 331159
2018-04-30 06:21:22 +00:00
Craig Topper ebc7de07ee [X86] Use a MnemonicAlias instead of an InstAlias.
llvm-svn: 331157
2018-04-30 06:21:19 +00:00
Craig Topper 64e7a16fe4 [X86] Remove support for accepting 'fnstsw %eax' and 'fnstsw %al'.
I assume this was done because gas accepted it at one point, but current versions of gas don't.

llvm-svn: 331154
2018-04-30 01:53:12 +00:00
Craig Topper b2bf3da152 [X86] Mark some more InstAliases as 'att' syntax only.
These aliases are used to default the memory forms of call and jmp to the size of the operating mode. This doesn't work for Intel syntax. We have a different hack in the AsmParser code itself to force a size on unsized memory operands.

llvm-svn: 331153
2018-04-30 01:53:10 +00:00
Craig Topper 8e3dbfd725 [X86] Remove unnecessary InstAliases. NFCI
These used to disambiguate MOV16ms/MOV16sm from other size instructions that no longer exist.

llvm-svn: 331134
2018-04-29 04:06:02 +00:00
Craig Topper 06624e1a93 [X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic

This patch restricts a lot of these to only one variant so we don't get the duplication.

This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.

llvm-svn: 331117
2018-04-28 18:46:11 +00:00
Craig Topper 19b85103a3 [X86] Add a BSWAP16 instruction using the 32-bit encoding plus a 0x66 prefix.
This encoding is recognized by the CPU, but the behavior is undefined. This makes the disassembler handle it correctly so we don't print bswapl with a 16-bit register.

llvm-svn: 330682
2018-04-24 04:28:02 +00:00
Gabor Buella 1a2ce572bf [X86] Revert r330638 - accidental commit
llvm-svn: 330640
2018-04-23 20:05:51 +00:00
Gabor Buella 213a7cda1f [X86] movdiri and movdir64b instructions
Reviewers: craig.topper
llvm-svn: 330638
2018-04-23 20:00:59 +00:00
Craig Topper e33ed7d667 [X86] Remove DATA32_PREFIX. Hack the printing for DATA16_PREFIX to print 'data32' in 16-bit mode. Hack the asm parser to convert 'data32' to 'data16' in 16-bit mode.
Improve the error messages to match GNU assembler.

This also allows us to remove the hack from the disassembler table building.

llvm-svn: 330531
2018-04-22 00:52:02 +00:00
Gabor Buella 31fa8025ba [X86] WaitPKG instructions
Three new instructions:

umonitor - Sets up a linear address range to be
monitored by hardware and activates the monitor.
The address range should be a writeback memory
caching type.

umwait - A hint that allows the processor to
stop instruction execution and enter an
implementation-dependent optimized state
until occurrence of a class of events.

tpause - Directs the processor to enter an
implementation-dependent optimized state
until the TSC reaches the value in EDX:EAX.

Also modifying the description of the mfence
instruction, as the rep prefix (0xF3) was allowed
before, which would conflict with umonitor during
disassembly.

Before:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
mfence

After:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
umonitor        %rax

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45253

llvm-svn: 330462
2018-04-20 18:42:47 +00:00
Simon Pilgrim 2f522ef13d [X86] Tag CLDEMOTE instruction with WriteLoad scheduling class
Same as other cacheline instructions

llvm-svn: 330424
2018-04-20 12:54:53 +00:00
Craig Topper ebf52e80c1 [X86] Correct the Defs, Uses, hasSideEffects, mayLoad, mayStore for XCHG and XADD instructions.
I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP.

llvm-svn: 330298
2018-04-18 22:07:53 +00:00
Craig Topper 04244cbf45 [X86] Fix the Uses/Defs,mayLoad,mayStore,hasSideEffects flags for the CMPXCHG instructions.
The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler.

llvm-svn: 330287
2018-04-18 20:15:00 +00:00
Gabor Buella 604be4424b [X86] Introduce cldemote instruction
Hint to hardware to move the cache line containing the
address to a more distant level of the cache without
writing back to memory.

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45256

llvm-svn: 329992
2018-04-13 07:35:08 +00:00
Simon Pilgrim 35935c0632 [X86] Remove remaining gpr schedule itineraries (PR37093)
llvm-svn: 329938
2018-04-12 18:46:15 +00:00
Simon Pilgrim 294556d40e [X86] Remove remaining system/special schedule itineraries (PR37093)
llvm-svn: 329906
2018-04-12 12:43:49 +00:00
Simon Pilgrim 0cd0fbd8c5 [X86] Remove system/control schedule itineraries (PR37093)
llvm-svn: 329903
2018-04-12 12:09:24 +00:00
Gabor Buella 2ef36f3571 [X86] Describe wbnoinvd instruction
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.

Reviewers: craig.topper, zvi, ashlykov

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D43816

llvm-svn: 329847
2018-04-11 20:01:57 +00:00
Chandler Carruth 0ca3bd0729 [x86] Model the direction flag (DF) separately from the rest of EFLAGS.
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.

In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.

I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.

Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.

Differential Revision: https://reviews.llvm.org/D45154

llvm-svn: 329673
2018-04-10 06:40:51 +00:00
Craig Topper 22d25a08ae [X86] Merge itineraries for CLC, CMC, and STC.
These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation.

llvm-svn: 329414
2018-04-06 16:16:43 +00:00
Craig Topper a30db995b3 [X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ.
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both.

llvm-svn: 329154
2018-04-04 07:00:24 +00:00
Craig Topper caec723a1a [X86] Add an itinerary to BTR64rr.
llvm-svn: 328956
2018-04-02 01:12:34 +00:00
Craig Topper 3f2dbec652 [X86] Remove ReadAfterLd from BMI and TBM instructions that don't have a register operand in their memory form
The memory form of these instructions only read an input from memory. They don't have any register operands.

Differential Revision: https://reviews.llvm.org/D44836

llvm-svn: 328828
2018-03-29 21:03:53 +00:00
Craig Topper 89310f56c8 [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI.
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd

Differential Revision: https://reviews.llvm.org/D44838

llvm-svn: 328823
2018-03-29 20:41:39 +00:00
Craig Topper 7456af88f4 [X86] Rename RIi64_NOREX tblgen class to just Ii64. Make RIi64 inherit from it. NFC
This feels more consistent with the other classes. We don't need to say _NOREX if we didn't start it with an R in the first place.

llvm-svn: 328757
2018-03-29 03:14:57 +00:00
Simon Pilgrim f33d905293 [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881)
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.

These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).

Differential Revision: https://reviews.llvm.org/D44879

llvm-svn: 328566
2018-03-26 18:19:28 +00:00
Sanjay Patel 05daae75ad [x86] put nops into the WriteNop class and customize for Jaguar
1. Given that we already have a classification bucket with 'nop' in the name, 
   that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably 
   llvm-mca) how to model the resource requirements better even though a nop has no 
   dependencies.

Differential Revision: https://reviews.llvm.org/D44608

llvm-svn: 327853
2018-03-19 14:26:50 +00:00
Oren Ben Simhon fdd72fd522 [X86] Added support for nocf_check attribute for indirect Branch Tracking
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
	1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
	2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.

This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.

Differential Revision: https://reviews.llvm.org/D41879

llvm-svn: 327767
2018-03-17 13:29:46 +00:00
Craig Topper 2b2d8c5eb2 [X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in the upper 32-bits of a 64-bit operation.
We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions.

For now only do this for optsize.

Differential Revision: https://reviews.llvm.org/D37418

llvm-svn: 325287
2018-02-15 19:57:35 +00:00
Craig Topper bab7b0a466 [X86] Dont' allow 'outs' and 'ins' in at&t syntax without suffixes.
The match would be ambiguous, but at&t asm parsing doesn't support ambiguous matches and will just return the first.

llvm-svn: 325192
2018-02-14 23:53:26 +00:00
Craig Topper 675752166d [X86] Don't swap argument on BOUND instruction in at&t syntax.
The bound instruction does not have reversed operands in gas.

Fixes PR27653.

Patch by Maya Madhavan.

Differential Revision: https://reviews.llvm.org/D43243

llvm-svn: 325178
2018-02-14 21:54:58 +00:00
Craig Topper 88939fefe8 [X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node.
This removes a bunch of special case code for selecting the immediate and folding loads.

llvm-svn: 324939
2018-02-12 21:18:11 +00:00
Craig Topper efe3923514 [X86] Remove unused multiclass argument. NFC
llvm-svn: 324938
2018-02-12 21:18:09 +00:00
Simon Pilgrim 0efe9bc953 [X86] Add missing scheduling class tag for i64 absolute address moves
Expand existing SchedRW to encompass these like it did for the other memory offset movs - added comments to closing braces to keep track of def scopes.

We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).

llvm-svn: 324910
2018-02-12 17:21:28 +00:00
Craig Topper dfc322ddf4 [X86] Allow zextload/extload i1->i8 to be folded into instructions during isel
Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier.

The gpr-to-mask.ll test changed because the kaddb instruction has no memory form.

llvm-svn: 324860
2018-02-12 01:33:36 +00:00
Hiroshi Inoue 0909ca132f [NFC] fix trivial typos in comments and documents
"in in" -> "in", "on on" -> "on" etc.

llvm-svn: 323508
2018-01-26 08:15:29 +00:00
Craig Topper e5aea25980 [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.
Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction.

llvm-svn: 323174
2018-01-23 05:37:00 +00:00
Chandler Carruth c58f2166ab Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.

The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.

However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.

On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.

This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886

We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
  __llvm_external_retpoline_r11
```
or on 32-bit:
```
  __llvm_external_retpoline_eax
  __llvm_external_retpoline_ecx
  __llvm_external_retpoline_edx
  __llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.

There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.

The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.

For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.

When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.

When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.

However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.

We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.

This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.

Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer

Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41723

llvm-svn: 323155
2018-01-22 22:05:25 +00:00
Craig Topper 84b26b90d1 [X86] Add intrinsic support for the RDPID instruction
This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.

Differential Revision: https://reviews.llvm.org/D42205

llvm-svn: 322910
2018-01-18 23:52:31 +00:00
Craig Topper 7a0c601f95 [X86] Revisit the fix I made years ago to make 'xchgl %eax, %eax' not encode using the 0x90 encoding in 64-bit mode.
Prior to this we had a separate instruction and register class that excluded eax to prevent matching the instruction that would encode with 0x90.

This patch changes this to just use an InstAlias to force xchgl %eax, %eax to use XCHG32rr instruction in 64-bit mode. This gets rid of the separate instruction and register class.

llvm-svn: 322532
2018-01-16 06:07:16 +00:00
Craig Topper daa385f480 [X86] Make 'xchgq %rax, %rax' an alias for the 0x90 nop encoding to match gas.
Previously we encoded it as 0x48 0x90.

llvm-svn: 322531
2018-01-16 06:07:14 +00:00
Craig Topper 72001f4647 [X86] Don't allow lods/stos/scas/cmps/movs to be parsed without a suffix and only memory operand in at&t syntax.
Without a register with a size being mentioned the instruction is ambiguous in at&t syntax. With Intel syntax the memory operation caries a size that can be used to disambiguate.

llvm-svn: 322356
2018-01-12 06:48:26 +00:00
Craig Topper 29ccb5c87d [X86] Don't require suffix on 'clr' mnemonic in intel syntax
llvm-svn: 322355
2018-01-12 06:48:24 +00:00
Craig Topper b1623321af [X86] Add 'l' and 'q' suffixes to the tbm instruction mnemonics.
While the suffix isn't required to disambiguate the instructions, it is required in order to parse the instructions when the suffix is specified in order to match the GNU assembler.

llvm-svn: 322354
2018-01-12 06:21:36 +00:00
Craig Topper 3554a71cf1 [X86] Disable movsq/stosq/scasqcmpsq/lodsq parsing in 64-bit mode.
llvm-svn: 322352
2018-01-12 05:38:14 +00:00
Craig Topper cf93feb981 [X86] Remove assembler predicates from all AVX512 related feature flags.
We don't do fine grained feature control like this on features prior to AVX512.

We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough.

llvm-svn: 321947
2018-01-06 21:45:30 +00:00
Craig Topper 004867312e [X86] Stop printing moves between VR64 and GR64 with 'movd' mnemonic. Use 'movq' instead.
This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well.

llvm-svn: 321898
2018-01-05 20:55:12 +00:00
Craig Topper 2e308a9b2f [X86] Add assembler predicates to BITALG/VBMI2/VNNI features to be consistent with the other AVX512 ISAs.
llvm-svn: 321416
2017-12-24 02:05:17 +00:00
Craig Topper e268598dd3 [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling for prefetch instructions.
Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well.

The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled.

Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions.

I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations.

llvm-svn: 321335
2017-12-22 02:30:30 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Craig Topper 23c348850f [X86] Add 'Requires<[In64BitMode]>' to a bunch of instructions that only have memory and immediate operands.
The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode.

llvm-svn: 320846
2017-12-15 19:01:51 +00:00
Craig Topper 446f3e2084 [X86] Remove the 'Requires' In64BitMode/Not64BitMode from the LWP instructions.
These aren't doing anything due to a top level "let Predicates =". I think the GR32/GR64 register class protects these anyway.

llvm-svn: 320844
2017-12-15 19:01:49 +00:00
Simon Pilgrim fabe354b42 [X86] Add LWP schedule tests
Tag LWP instructions as WriteSystem

llvm-svn: 320387
2017-12-11 16:47:21 +00:00
Simon Pilgrim e049038692 [X86] Tag LOCK/REX64/DATA16/DATA32 instruction prefix scheduler classes
llvm-svn: 320266
2017-12-09 21:27:03 +00:00
Simon Pilgrim 231fab072f [X86] Tag REP/REPNE prefix instructions as microcoded scheduler classes
llvm-svn: 320263
2017-12-09 20:16:37 +00:00
Simon Pilgrim 2e7314eb2f [X86] Tag missing EH pseudo instruction scheduler classes
llvm-svn: 320262
2017-12-09 20:04:02 +00:00
Simon Pilgrim 8e39dc36b8 [X86] Tag move immediate instructions scheduler classes
llvm-svn: 320169
2017-12-08 18:35:40 +00:00
Simon Pilgrim 83708cabc0 [X86][AVX512] Tag CLWB instruction to CLFLUSH/PREFETCH scheduler class
llvm-svn: 320156
2017-12-08 15:19:10 +00:00
Simon Pilgrim 386b23f1fa [X86] Tag BMI/BMI2/TBM instructions scheduler classes
Put these under UNARY/BINOP ALU itinerary classes for now - seems to be a good average value

llvm-svn: 320064
2017-12-07 17:37:39 +00:00
Simon Pilgrim f1d599adb2 [X86] Tag LZCNT/TZCNT instructions scheduler classes
Tagged as IMUL instructions for a reasonable approximation (ALU tends to be a lot faster) - POPCNT is currently tagged as FAdd which I think should be replaced with IMUL as well

llvm-svn: 320051
2017-12-07 15:24:14 +00:00
Simon Pilgrim 60411d9a8c [X86] Tag RDRAND/RDSEED instruction scheduler classes
llvm-svn: 320045
2017-12-07 14:18:48 +00:00
Simon Pilgrim b9aa93cb93 [X86] Tag CLFLUSHOPT with same scheduling behaviour as CLFLUSH
llvm-svn: 319253
2017-11-28 23:25:42 +00:00
Oren Ben Simhon fa582b075c Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)
Shadow stack solution introduces a new stack for return addresses only.
The HW has a Shadow Stack Pointer (SSP) that points to the next return address.
If we return to a different address, an exception is triggered.
The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP).
The intrinsics are mapped to new instruction set that implements CET mechanism.

The patch also includes initial infrastructure support for IBT.

For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

Differential Revision: https://reviews.llvm.org/D40223

Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4
llvm-svn: 318996
2017-11-26 13:02:45 +00:00
Coby Tayree d8b17bedfa [x86][icelake]GFNI
galois field arithmetic (GF(2^8)) insns:
gf2p8affineinvqb
gf2p8affineqb
gf2p8mulb
Differential Revision: https://reviews.llvm.org/D40373

llvm-svn: 318993
2017-11-26 09:36:41 +00:00
Craig Topper e485631cd1 [X86] Add separate intrinsics for scalar FMA4 instructions.
Summary:
These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.

I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.

I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.

I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.

fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39851

llvm-svn: 318984
2017-11-25 18:32:43 +00:00
Nirav Dave 61ffc9c0eb Avoid unecessary opsize byte in segment move to memory
Segment moves to memory are always 16-bit. Remove invalid 32 and 64
bit variants.

Recommiting with missing clang inline assembly test change.

Fixes PR34478.

Reviewers: rnk, craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39847

llvm-svn: 318797
2017-11-21 19:28:13 +00:00
Simon Pilgrim a93dea535f [X86][LWP] Add missing LWP itinerary class to lwpins instructions
It's on all other LWP instruction but I missed it from lwpins, despite similar scheduling behaviour. 

llvm-svn: 318751
2017-11-21 11:17:11 +00:00
Coby Tayree 5c7fe5df53 [x86][icelake]BITALG
vpopcnt{b,w}
Differential Revision: https://reviews.llvm.org/D40213

llvm-svn: 318748
2017-11-21 10:32:42 +00:00
Coby Tayree 3880f2a363 [x86][icelake]VNNI
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208

llvm-svn: 318746
2017-11-21 10:04:28 +00:00
Coby Tayree 71e37cc9ff [x86][icelake]vbmi2
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206

llvm-svn: 318745
2017-11-21 09:48:44 +00:00
Coby Tayree 7ca5e58736 [x86][icelake]vpclmulqdq introduction
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101

llvm-svn: 318741
2017-11-21 09:30:33 +00:00
Coby Tayree 2a1c02fcbc [x86][icelake]VAES introduction
an icelake promotion of AES
Differential Revision: https://reviews.llvm.org/D40078

llvm-svn: 318740
2017-11-21 09:11:41 +00:00
Richard Trieu 26917db696 Revert r318678 to fix Clang test
r318678 caused the Clang test CodeGen/ms-inline-asm.c to start failing.

llvm-svn: 318710
2017-11-21 00:12:18 +00:00
Nirav Dave 3669061298 [X86] Avoid unecessary opsize byte in segment move to memory
Summary:

Segment moves to memory are always 16-bit. Remove invalid 32 and 64
bit variants.

Fixes PR34478.

Reviewers: rnk, craig.topper

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39847

llvm-svn: 318678
2017-11-20 18:38:55 +00:00
Craig Topper 428a4e6374 [X86] Make FeatureAVX512 imply FeatureF16C.
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.

Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.

All known CPUs with AVX512 have F16C so this should safe for now.

llvm-svn: 317521
2017-11-06 22:49:04 +00:00
Craig Topper 4e13d4de52 [X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used.
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.

This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.

The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.

This should fully fix PR35068 finishing the fix started in r316860.

Reviewers: RKSimon, zvi, spatel

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39411

llvm-svn: 316913
2017-10-30 14:51:37 +00:00
Craig Topper 1db2f0828e [X86] Change RDRAND to use PS instead of TB.
Should be no functional change for now. A future disassembler change will prevent disassembling with 0xf2/0xf3.

llvm-svn: 316339
2017-10-23 16:22:38 +00:00