Commit Graph

49349 Commits

Author SHA1 Message Date
Oliver Stannard 367b4741f4 [AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled
When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869

llvm-svn: 343969
2018-10-08 14:12:08 +00:00
Oliver Stannard c922116a51 [AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI
When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868

llvm-svn: 343968
2018-10-08 14:09:15 +00:00
Oliver Stannard 250e5a5b65 [AArch64][v8.5A] Branch Target Identification code-generation pass
The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

llvm-svn: 343967
2018-10-08 14:04:24 +00:00
Alexander Ivchenko 1aedf203dd [GlobalIsel][X86] Support G_UDIV/G_UREM/G_SREM
Support G_UDIV/G_UREM/G_SREM. The instruction selection
code is taken from FastISel with only minor tweaks to adapt
for GlobalISel.

Differential Revision: https://reviews.llvm.org/D49781

llvm-svn: 343966
2018-10-08 13:40:34 +00:00
Neil Henning 57f5d0a885 [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:

- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
  CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
  weren't being tested (including the FMF flag that wasn't checked).

This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).

Differential Revision: https://reviews.llvm.org/D52087

llvm-svn: 343962
2018-10-08 10:32:33 +00:00
Peter Smith 6f36cd4d76 [ARM] Account for implicit IT when calculating inline asm size
When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834

llvm-svn: 343960
2018-10-08 09:38:28 +00:00
Oliver Stannard 9ecdac8ee0 [AArch64] Fix verifier error when outlining indirect calls
The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829

llvm-svn: 343959
2018-10-08 09:18:48 +00:00
Simon Pilgrim 9fa1c66421 [X86] getFauxShuffleMask - Handle undef + sentinel values in subvector insertion
llvm-svn: 343926
2018-10-06 22:13:44 +00:00
Simon Pilgrim a30e8d23e2 [X86][AVX] Ensure resolveTargetShuffleInputs shuffle masks are the correct width
Don't handle ZERO_EXTEND style shuffles until we support bitcasts. Found by inspection.

llvm-svn: 343924
2018-10-06 17:18:41 +00:00
Simon Pilgrim 62d199f4e5 [X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits succeeds on either operand
Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself

llvm-svn: 343922
2018-10-06 14:51:14 +00:00
Simon Pilgrim 0cc0a24b55 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - simplify PSHUFB masks
Attempt to simplify PSHUFB masks (even non-constant ones) - we should probably be able to simplify other variable shuffles as well as the need arises.

llvm-svn: 343919
2018-10-06 13:49:31 +00:00
Simon Pilgrim ae78d709b4 [X86] Use the SimplifyDemandedBits wrappers where possible. NFCI.
Leave the wrapper to handle TargetLowering::TargetLoweringOpt and CommitTargetLoweringOpt.

llvm-svn: 343918
2018-10-06 13:29:08 +00:00
Alex Bradbury 639df9e4c0 [RISCV] Compress addiw rd, x0, simm6 to c.li rd, simm6
A pattern was present for addi rd, x0, simm6 but not addiw which is
semantically identical when the source register is x0. This patch addresses
that, and the benefit can be seen in rv64c-aliases-valid.s.

llvm-svn: 343911
2018-10-06 06:09:46 +00:00
Tom Stellard 251ee083a3 AMDGPU: Consolidate SMRD TableGen patterns
Summary:
Merge the SMRD patterns for CI into the same multiclass as the
patterns for other sub-targets.

This removes some duplicate code and will make it easier for some
future GlobalISel changes I would like to do.

Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52557

llvm-svn: 343909
2018-10-06 03:32:43 +00:00
Matthias Braun 81578e9f77 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Simon Pilgrim dc97118efe [X86][AVX] Limit getFauxShuffleMask INSERT_SUBVECTOR support to 2 inputs
rL343853 didn't limit the number of subinputs, but we don't currently support faux shuffles with more than 2 total inputs, so put a limiter in place until this is fixed.

Found by Artem Dergachev.

llvm-svn: 343891
2018-10-05 21:44:19 +00:00
Craig Topper 0ed892da70 [X86] Don't promote i16 compares to i32 if the immediate will fit in 8 bits.
The comments in this code say we were trying to avoid 16-bit immediates, but if the immediate fits in 8-bits this isn't an issue. This avoids creating a zero extend that probably won't go away.

The movmskb related changes are interesting. The movmskb instruction writes a 32-bit result, but fills the upper bits with 0. So the zero_extend we were previously emitting was free, but we turned a -1 immediate that would fit in 8-bits into a 32-bit immediate so it was still bad.

llvm-svn: 343871
2018-10-05 18:13:36 +00:00
Simon Pilgrim f09fc3bc12 [X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957)
Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957).

This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level.

I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place.

Differential Revision: https://reviews.llvm.org/D52886

llvm-svn: 343868
2018-10-05 17:57:29 +00:00
Simon Pilgrim 6c5ab48fe7 [X86][AVX] getFauxShuffleMask - add support for INSERT_SUBVECTOR subvector shuffles
Decode subvector shuffles from INSERT_SUBVECTOR(SRC0, SHUFFLE(EXTRACT_SUBVECTOR(SRC1))

This was found necessary while investigating PR39161

llvm-svn: 343853
2018-10-05 14:41:00 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Tom Stellard 7c65078f04 AMDGPU/GlobalISel: Add support for G_INTTOPTR
Summary: This is a no-op.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52916

llvm-svn: 343839
2018-10-05 04:34:09 +00:00
Thomas Lively 4b47d08e52 [WebAssembly] Saturating arithmetic intrinsics
Summary: Depends on D52805.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52813

llvm-svn: 343833
2018-10-05 00:45:20 +00:00
Yury Delendik 409b439152 [WebAssembly] Ignore DBG_VALUE in WebAssemblyCFGStackify pass when looking for block start
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=39158 and regression caused by
D49034. Though it is possible the problem was existed before and was exposed by
additional DBG_VALUEs.

Reviewers: sunfish, dschuff, aheejin

Reviewed By: aheejin

Subscribers: sbc100, aheejin, llvm-commits, alexcrichton, jgravelle-google

Differential Revision: https://reviews.llvm.org/D52837

llvm-svn: 343827
2018-10-04 23:31:00 +00:00
Ana Pazos 9d6c55323f [RISCV] Support named operands for CSR instructions.
Reviewers: asb, mgrang

Reviewed By: asb

Subscribers: jocewei, mgorny, jfb, PkmX, MartinMosbeck, brucehoult, the_o, rkruppe, rogfer01, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones

Differential Revision: https://reviews.llvm.org/D46759

llvm-svn: 343822
2018-10-04 21:50:54 +00:00
Craig Topper 7d2155e3f9 [X86][LegalizeVectorOps] Use MERGE_VALUES to return two results from LowerLoad. Remove special case code in LegalizeVectorOps that allowed us to only return one result.
Previously we replaced the chain use ourself and return the data result. LegalizeVectorOps then detected that we'd done this and assumed the chain had already been handled.

This commit instead returns a MERGE_VALUES node with two results joined from nodes. This allows LegalizeVectorOps to do all the replacements for us without any special casing. The MERGE_VALUES will be removed by DAG combine.

llvm-svn: 343817
2018-10-04 21:24:24 +00:00
Heejin Ahn b68d591475 [WebAssembly] Don't modify preds/succs iterators while erasing from them
Summary:
This caused out-of-bound bugs. Found by
`-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52902

llvm-svn: 343814
2018-10-04 21:03:35 +00:00
Konstantin Zhuravlyov aa067cb9fb AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa
The isAmdCodeObjectV2 is a misleading name which actually checks whether the os
is amdhsa or mesa.

Also add a test to make sure we do not generate old kernel header for code
object v3.

Differential Revision: https://reviews.llvm.org/D52897

llvm-svn: 343813
2018-10-04 21:02:16 +00:00
Martin Storsjo 37b742e208 [COFF] [X86] Don't use llvm_unreachable for unsupported relocation types
This can happen if assembling a reference to _GLOBAL_OFFSET_TABLE_.

While it doesn't make sense to try to assemble that for COFF,
the fact that we previously used llvm_unreachable meant that the code
had undefined behaviour if something tried to assemble that.

The configure script of libgmp would try to assemble such a snippet
(which should signal a failure). If llvm is built without assertions,
the undefined behaviour meant a (near) infinite loop.

Differential Revision: https://reviews.llvm.org/D52903

llvm-svn: 343811
2018-10-04 20:43:38 +00:00
Matthias Braun 0c67a4e958 AArch64: Fix XSeqPairs/WSeqPairs problems
- Fix spill/reloads of XSeqPairs failing with vregs (only physregs
  worked correctly)
- Add missing spill/reload code for WSeqPairs class

Differential Revision: https://reviews.llvm.org/D52761

llvm-svn: 343799
2018-10-04 17:02:53 +00:00
Farhana Aleen 4bc597bff5 [AMDGPU] Match signed dot4/8 pattern.
Summary: This patch matches signed dot4 and dot8 pattern.

Author: FarhanaAleen

Reviewed By: msearles

Differential Revision: https://reviews.llvm.org/D52520

llvm-svn: 343798
2018-10-04 16:57:37 +00:00
Alex Bradbury 5bf3b20e99 [RISCV] Remove overzealous is64Bit checks
lowerGlobalAddress, lowerBlockAddress, and insertIndirectBranch contain 
overzealous checks for is64Bit. These functions are all safe as-implemented 
for RV64.

llvm-svn: 343781
2018-10-04 14:30:03 +00:00
David Greene 4f916df29e [X86] Set correct MMO offset on scalarized load pieces
When scalarizing a load, be sure to update the offset in the
MachineMemOperand for each scalar load.

llvm-svn: 343776
2018-10-04 14:07:59 +00:00
Simon Pilgrim 991b0d24ff Fix MSVC "not all control paths return a value" warning. NFCI.
llvm-svn: 343765
2018-10-04 10:25:52 +00:00
Alex Bradbury e96b7c88a3 [RISCV] Bugfix for floats passed on the stack with the ILP32 ABI on RV32F
f32 values passed on the stack would previously cause an assertion in 
unpackFromMemLoc.. This would only trigger in the presence of the F extension 
making f32 a legal type. Otherwise the f32 would be legalized.

This patch fixes that by keeping LocVT=f32 when a float is passed on the 
stack. It also adds test coverage for this case, and tests that also 
demonstrate lw/sw/flw/fsw will be selected when most profitable. i.e. there is 
no unnecessary i32<->f32 conversion in registers.

llvm-svn: 343756
2018-10-04 07:28:49 +00:00
Craig Topper 8b3c46f0a8 [X86] Merge matchANDXORWithAllOnesAsANDNP into combineANDXORWithAllOnesIntoANDNP. NFCI
It's the only caller and the logic pretty easy to combine.

llvm-svn: 343754
2018-10-04 06:13:27 +00:00
Alex Bradbury 0e16766b76 [RISCV][NFC] Fix naming of RISCVISelLowering::{LowerRETURNADDR,LowerFRAMEADDR}
Rename to lowerRETURNADDR, lowerFRAMEADDR in order to be consistent with the 
LLVM coding style and the other functions in this file.

llvm-svn: 343752
2018-10-04 05:27:50 +00:00
Alex Bradbury 5ac0a2fc48 [RISCV] Handle redundant SplitF64+BuildPairF64 pairs in a DAGCombine
r343712 performed this optimisation during instruction selection. As Eli 
Friedman pointed out in post-commit review, implementing this as a DAGCombine 
might allow opportunities for further optimisations.

llvm-svn: 343741
2018-10-03 23:30:16 +00:00
Thomas Lively 5d461c96bd [WebAssembly] Bitselect intrinsic and instruction
Summary: Depends on D52755.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D52805

llvm-svn: 343739
2018-10-03 23:02:23 +00:00
Alex Bradbury 1dbfdeb6e5 [RISCV][NFC] Refactor LocVT<->ValVT converstion in RISCVISelLowering
There was some duplicated logic for using the LocInfo of a CCValAssign in 
order to convert from the ValVT to LocVT or vice versa. Resolve this by 
factoring out convertLocVTFromValVT from unpackFromRegLoc. Also rename 
packIntoRegLoc to the more appropriate convertValVTToLocVT and call these 
helper functions consistently.

llvm-svn: 343737
2018-10-03 22:53:25 +00:00
Derek Schuff 77a7a38006 [WebAssembly] Refactor WasmSignature and use it for MCSymbolWasm
MCContext does not destroy MCSymbols on shutdown. So, rather than putting
SmallVectors (which may heap-allocate) inside MCSymbolWasm, use unowned pointer
to a WasmSignature instead. The signatures are now owned by the AsmPrinter.
Also uses WasmSignature instead of param and result vectors in TargetStreamer,
and leaves some TODOs for further simplification.

 Differential Revision: https://reviews.llvm.org/D52580

llvm-svn: 343733
2018-10-03 22:22:48 +00:00
Craig Topper a65c2dbfd6 [X86] Stop promoting vector ISD::SELECT to vXi64.
The additional patterns needed for this aren't overwhelming and introducing extra bitcasts during lowering limits our ability to do computeNumSignBits. Not that I have a good example of that for select. I'm just becoming increasingly grumpy about promotion of AND/OR/XOR. SELECT was just a lot easier to fix.

llvm-svn: 343723
2018-10-03 21:10:29 +00:00
Craig Topper c39dc41b63 [X86] Add CMOV_VK2/VK4 pseudos and remove lowering code that turned v2i1/v4i1 SELECT into v8i1.
llvm-svn: 343713
2018-10-03 20:28:43 +00:00
Alex Bradbury ce9049952f [RISCV][NFCI] Handle redundant splitf64+buildpairf64 pairs during instruction selection
Although we can't write a tablegen pattern to remove redundant 
splitf64+buildf64 pairs due to the multiple return values, we can handle it 
with some C++ selection code. This is simpler than removing them after 
instruction selection through RISCVDAGToDAGISel::PostprocessISelDAG, as was 
done previously.

llvm-svn: 343712
2018-10-03 20:12:10 +00:00
Craig Topper 703fbde3cb [X86] Add CMOV pseudos for VR128X and VR256X register classes. Use them when AVX512VL is enabled.
This allows the phi nodes to be generated with the correct register class when expanded.

llvm-svn: 343710
2018-10-03 19:48:26 +00:00
Craig Topper 4b62c2dbda [X86] Don't break CMOV pseudo instructions down by type. Just by register class.
The register class is all that's important for the pseudo instructions. We can use patterns to handle the different types.

llvm-svn: 343709
2018-10-03 19:48:23 +00:00
Simon Pilgrim aabd99c27a [X86] PUSH/POP 'mem-mem' instructions are not RMW - these are 2 different addresses
This patch adds a 'WriteCopy' [WriteLoad, WriteStore] schedule sequence instead to better model the behaviour

Found by @andreadb during llvm-mca testing on btver2 which was crashing on "zero uop" WriteRMW only instructions

llvm-svn: 343708
2018-10-03 19:02:38 +00:00
Simon Pilgrim b80d27a916 [X86] Move Atomic binops to use WriteALURMW schedule class
These were being tagged as <WriteALULd, WriteRMW> instead of properly using the RMW sequence

llvm-svn: 343705
2018-10-03 18:38:28 +00:00
Simon Pilgrim 0b451a2983 [X86][Btver2] Fix MMX PSHUFB schedule
Match AMD Fam16h SOG + llvm-exegesis tests

llvm-svn: 343701
2018-10-03 18:18:50 +00:00
Simon Pilgrim a400612aed [X86] Move Atomic CMPXCHG to WriteCMPXCHGRMW schedule class
llvm-svn: 343700
2018-10-03 18:05:01 +00:00
Simon Pilgrim 2c59475c06 [X86] Add SkylakeClient uops counter - same as the other Intel models.
llvm-svn: 343697
2018-10-03 16:45:26 +00:00