Commit Graph

43713 Commits

Author SHA1 Message Date
Matt Arsenault 6b114d2c50 AMDGPU: Select clamp pattern with v2f16
llvm-svn: 312087
2017-08-30 01:20:17 +00:00
Craig Topper 559f61e179 [X86] Finish the subtarget and predicate implementation of CLWB.
We don't have an intrinsic implemented for this instruction yet, but it looked odd that we were missing the accessor method from the subtarget.

llvm-svn: 312064
2017-08-29 23:13:36 +00:00
Matt Arsenault 2d69c924f7 AMDGPU: Fix typo
llvm-svn: 312040
2017-08-29 21:25:51 +00:00
Craig Topper a54ca1a662 [X86] Fix copy pasto from r311841. Call getOnesVector instead of getZeroVector.
llvm-svn: 312006
2017-08-29 15:29:36 +00:00
Javed Absar 5766b8eea0 [ARM] - Tidy-up ARMAsmPrinter.cpp
Change to range-loop where missing.

Reviwewed by: @fhahn, @asb
Differential Revision: https://reviews.llvm.org/D37199

llvm-svn: 311993
2017-08-29 10:04:18 +00:00
Diana Picus c9f29c62cc [ARM] GlobalISel: Select globals in PIC mode
Support the selection of G_GLOBAL_VALUE in the PIC relocation model. For
simplicity we use the same pseudoinstructions for both Darwin and ELF:
(MOV|LDRLIT)_ga_pcrel(_ldr).

This is new for ELF, so it requires a small update to the ARM pseudo
expansion pass to make sure it adds the correct constant pool modifier
and add-current-address in the case of ELF.

Differential Revision: https://reviews.llvm.org/D36507

llvm-svn: 311992
2017-08-29 09:47:55 +00:00
Eric Christopher 5bea524091 Revert "The current version of LLVM X86 disassembler incorrectly interprets some possible sets of x86 prefixes. This patch is the first step to close PR7709 and PR17697. There will be next patch(es) to close relative PRs." temporarily while some regressions are addressed.
This reverts commit r311882.

llvm-svn: 311987
2017-08-29 08:23:46 +00:00
Craig Topper 62c47a2aa5 Mark Knights Landing as having slow two memory operand instructions
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.

Patch by David Zarzycki.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37224

llvm-svn: 311979
2017-08-29 05:14:27 +00:00
Evandro Menezes 4976d6a0c6 [AArch64] Adjust the cost model for Exynos M1 and M2
Add new predicate to more accurately model the scheduling around branches
and function calls and of loads and stores of pairs and integer
multiplications.

llvm-svn: 311944
2017-08-28 22:51:52 +00:00
Evandro Menezes 509516d200 [AArch64] Adjust the cost model for Exynos M1 and M2
Add new predicate to more accurately model the cost of arithmetic and
logical operations shifted left.

Differential revision: https://reviews.llvm.org/D37151

llvm-svn: 311943
2017-08-28 22:51:32 +00:00
Geoff Berry 40cdc0e053 [AArch64][Falkor] Avoid generating STRQro* instructions
Summary:
STRQro* instructions are slower than the alternative ADD/STRQui expanded
instructions on Falkor, so avoid generating them unless we're optimizing
for code size.

Reviewers: t.p.northover, mcrosier

Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D37020

llvm-svn: 311931
2017-08-28 20:48:43 +00:00
Joerg Sonnenberger 0f76a35c5e Fix ARMv4 support
ARMv4 doesn't support the "BX" instruction, which has been introduced
with ARMv4t. Adjust the call lowering and tail call implementation
accordingly.

Further changes are necessary to ensure that presence of the v4t feature
is correctly set. Most importantly, the "generic" CPU for thumb-*
triples should include ARMv4t, since thumb mode without thumb support
would naturally be pointless.

Add a couple of asserts to ensure thumb instructions are not emitted
without CPU support.

Differential Revision: https://reviews.llvm.org/D37030

llvm-svn: 311921
2017-08-28 20:20:47 +00:00
Geoff Berry 75c4ae3066 [ARM] Fix bug in ARMLoadStoreOptimizer when kill flags are missing.
Summary:
ARMLoadStoreOpt::FixInvalidRegPairOp() was only checking if one of the
load destination registers to be split overlapped with the base register
if the base register was marked as killed.  Since kill flags may not
always be present, this can lead to incorrect code.

This bug was exposed by my MachineCopyPropagation change D30751 breaking
the sanitizer-x86_64-linux-android buildbot.

Also clean up some dead code and add an assert that a register offset is
never encountered by this code, since it does not handle them correctly.

Reviewers: MatzeB, qcolombet, t.p.northover

Subscribers: aemerson, javed.absar, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D37164

llvm-svn: 311907
2017-08-28 19:03:45 +00:00
Stefan Pintilie c35e4de388 [Power9] Add new instructions for floating point status and control registers.
Added the following P9 instructions: mffsce, mffscdrn, mffscdrni, mffscrn,
  mffscrni, mffsl

Differential Revision: https://reviews.llvm.org/D37167

llvm-svn: 311903
2017-08-28 18:46:01 +00:00
Krzysztof Parzyszek 2164a271a3 [Hexagon] Check for potential bank conflicts in post-RA scheduling
Insert artificial edges between loads that could cause a cache bank
conflict.

llvm-svn: 311901
2017-08-28 18:36:21 +00:00
Stanislav Mekhanoshin 312c557b3b [AMDGPU] Fix regression in AMDGPULibCalls allowing native for doubles
Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was
allowed to produce HSAIL's nsqrt instruction. HSAIL is not here
and we stick with non-existing native_sqrt(double) as a result.

Add check for f64 to not return native functions and also remove
handling of f64 case for fold_sqrt.

Differential Revision: https://reviews.llvm.org/D37223

llvm-svn: 311900
2017-08-28 18:00:08 +00:00
Stanislav Mekhanoshin dad7cf62de [AMDGPU] computeKnownBitsForTargetNode for 24 bit mul
Differential Revision: https://reviews.llvm.org/D37168

llvm-svn: 311896
2017-08-28 16:35:37 +00:00
Krzysztof Parzyszek 95da97ec56 [Hexagon] Break up DAG mutations into separate classes, move to subtarget
llvm-svn: 311895
2017-08-28 16:24:22 +00:00
Krzysztof Parzyszek 697297afa9 [Hexagon] Move pre-RA DAG mutations to scheduler constructor
llvm-svn: 311894
2017-08-28 15:52:54 +00:00
Craig Topper fa86fd928e [X86] Make 128/256-bit extract_subvector Legal instead of Custom. Move combining with BUILD_VECTOR from Legalization to DAG combine
EXTRACT_SUBVECTOR was marked Custom solely so we could combine it with BUILD_VECTOR operations to create smaller BUILD_VECTORS during Legalization. But that sort of combining should really be done by the DAG combiner.

This patch adds the last piece of needed supported DAG combine to handle this. Once that's done we can make the EXTRACT_SUBVECTOR operations Legal.

Differential Revision: https://reviews.llvm.org/D37197

llvm-svn: 311893
2017-08-28 15:32:50 +00:00
Andrew V. Tischenko 574962a3b3 The current version of LLVM X86 disassembler incorrectly interprets some possible sets of x86 prefixes. This patch is the first step to close PR7709 and PR17697. There will be next patch(es) to close relative PRs.
Differential Revision: https://reviews.llvm.org/D36788

M    lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp
M    lib/Target/X86/Disassembler/X86DisassemblerDecoder.h
A    test/MC/Disassembler/X86/prefixes-i386.s
A    test/MC/Disassembler/X86/prefixes-x86_64.s
M    test/MC/Disassembler/X86/prefixes.txt

llvm-svn: 311882
2017-08-28 10:43:14 +00:00
Gadi Haber d76f7b824e [X86][Haswell] Updating HSW instruction scheduling information
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud

Differential Revision: https://reviews.llvm.org/D36663

llvm-svn: 311879
2017-08-28 10:04:16 +00:00
NAKAMURA Takumi a1e97a77f5 Untabify.
llvm-svn: 311875
2017-08-28 06:47:47 +00:00
Craig Topper 33681161c4 [X86] Use getUnpackl helper to create an ISD::VECTOR_SHUFFLE instead of using X86ISD::UNPCKL in reduceVMULWidth.
This runs fairly early, we should use target independent nodes if possible.

llvm-svn: 311873
2017-08-28 05:14:38 +00:00
Craig Topper 2c77011d15 [X86] Add an early out to combineLoopMAddPattern and combineLoopSADPattern when SSE2 is disabled.
Without this the madd.ll and sad.ll test cases both throw assertions if you run them with SSE2 disabled.

llvm-svn: 311872
2017-08-28 04:29:08 +00:00
Petar Jovanovic f11daad18d [mips] Generate NMADD and NMSUB instructions when fneg node is present
This patch enables generation of NMADD and NMSUB instructions when fneg node
is present. These instructions are currently only generated if fsub node is
present.

Patch by Stanislav Ocovaj.

Differential Revision: https://reviews.llvm.org/D34507

llvm-svn: 311862
2017-08-27 21:07:24 +00:00
Javed Absar b81fa9932a [ARM] Tidy-up condition-code support functions
Move condition code support functions to Utils and remove code duplication.

Reviewed by: @fhahn, @asb
Differential Revision: https://reviews.llvm.org/D37179

llvm-svn: 311860
2017-08-27 20:38:28 +00:00
Craig Topper 80075a5fb7 [AVX512] Add more patterns for using masked moves for subvector extracts of the lowest subvector. This time with bitcasts between the vselect and the extract.
llvm-svn: 311856
2017-08-27 19:03:36 +00:00
Javed Absar 17ee7c0977 [ARM] Tidy-up ARMAsmParser. NFC.
Simplify getDRegFromQReg function

Reviewed by: @fhahn, @asb
Differential Revision: https://reviews.llvm.org/D37118

llvm-svn: 311850
2017-08-27 14:46:57 +00:00
Craig Topper 36bd247f64 [X86] Add a target-specific DAG combine to combine extract_subvector from all zero/one build_vectors.
llvm-svn: 311841
2017-08-27 05:39:57 +00:00
Craig Topper 71dab64a57 [X86] Use getOnesVector instead of using DAG.getConstant(-1).
llvm-svn: 311840
2017-08-27 03:26:04 +00:00
Craig Topper a088362e88 [AVX512] Add patterns to match masked extract_subvector with bitcasts between the vselect and the extract_subvector. Remove the late DAG combine.
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.

Add some more test cases to ensure we've also got most of the zero masking covered too.

llvm-svn: 311837
2017-08-26 22:24:57 +00:00
Craig Topper 69ec201848 [X86] Qualify the RMW INC/DEC patterns with NotSlowIncDec.
We were suppressing most uses of INC/DEC, but this one seems to have been missed.

llvm-svn: 311828
2017-08-26 06:24:25 +00:00
Craig Topper d27386a9ed [AVX512] Add patterns to use masked moves to implement masked extract_subvector of the lowest subvector.
This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI.

llvm-svn: 311821
2017-08-25 23:34:59 +00:00
Chandler Carruth 4b611a896d [x86] Teach the backend to fold more read-modify-write memory operands
to instructions.

These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.

Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.

I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.

Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.

Differential Revision: https://reviews.llvm.org/D37130

llvm-svn: 311806
2017-08-25 22:50:52 +00:00
Craig Topper c93d0556ae [X86] Use SDValue::getOpcode instead of calling getNode and calling getOpcode on that. NFC
llvm-svn: 311765
2017-08-25 05:36:29 +00:00
Craig Topper fc53dc2d43 [X86] Use isUInt and isShiftedUInt instead of using our own masking and compares. NFCI
While there use a local variable instead of calling C->getZExtValue() repeatedly.

llvm-svn: 311764
2017-08-25 05:04:34 +00:00
Aditya Nandakumar 892979effc [GISel]: Implement widenScalar for Legalizing G_PHI
https://reviews.llvm.org/D37018

llvm-svn: 311763
2017-08-25 04:57:27 +00:00
Chandler Carruth 96db308f03 [x86] NFC: More refactoring to pave the way to extending this ISel logic
to handle other x86 pseudos that carry flags and thus can't be matched
by our ISel patterns with fused memory accesses.

Differential Revision: https://reviews.llvm.org/D37088

llvm-svn: 311749
2017-08-25 02:06:36 +00:00
Chandler Carruth 03258f251f [x86] NFC - Refactor the custom lowering of `(load; op; store)` RMW sequences.
This extracts the code out of a giant switch in preparation for expanding it to
handle operations other thin `inc` and `dec`. Add a FIXME indicating what's
coming here.

Differential Revision: https://reviews.llvm.org/D37045

llvm-svn: 311748
2017-08-25 02:04:03 +00:00
Craig Topper 355d8cff49 [X86] Add TBM instructions to X86InstrInfo::isDefConvertible.
This allows us to remove "test" instructions and use the flags from the TBM instructions directly.

llvm-svn: 311747
2017-08-25 01:59:06 +00:00
Chandler Carruth 5b491808f5 [x86] Back out one aspect of r311318: don't generically set
FeatureSlowUAMem32.

The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.

The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.

The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/

It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.

llvm-svn: 311740
2017-08-25 00:56:05 +00:00
Chandler Carruth 8ac488b161 [x86] Fix an amazing goof in the handling of sub, or, and xor lowering.
The comment for this code indicated that it should work similar to our
handling of add lowering above: if we see uses of an instruction other
than flag usage and store usage, it tries to avoid the specialized
X86ISD::* nodes that are designed for flag+op modeling and emits an
explicit test.

Problem is, only the add case actually did this. In all the other cases,
the logic was incomplete and inverted. Any time the value was used by
a store, we bailed on the specialized X86ISD node. All of this appears
to have been historical where we had different logic here. =/

Turns out, we have quite a few patterns designed around these nodes. We
should actually form them. I fixed the code to match what we do for add,
and it has quite a positive effect just within some of our test cases.
The only thing close to a regression I see is using:

  notl %r
  testl %r, %r

instead of:

  xorl -1, %r

But we can add a pattern or something to fold that back out. The
improvements seem more than worth this.

I've also worked with Craig to update the comments to no longer be
actively contradicted by the code. =[ Some of this still remains
a mystery to both Craig and myself, but this seems like a large step in
the direction of consistency and slightly more accurate comments.

Many thanks to Craig for help figuring out this nasty stuff.

Differential Revision: https://reviews.llvm.org/D37096

llvm-svn: 311737
2017-08-25 00:34:07 +00:00
Sanjay Patel e404cbff66 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840

llvm-svn: 311731
2017-08-24 23:24:43 +00:00
Konstantin Zhuravlyov 68107657d4 AMDGPU: Fix gfx801 features
gfx801 has 1/2 rate F64, Fast F32 FMA

Differential Revision: https://reviews.llvm.org/D36981

llvm-svn: 311694
2017-08-24 20:03:07 +00:00
Jacob Gravelle 690b76e13d [WebAssembly] FastISel : Bail to SelectionDAG for constexpr calls
Summary: Currently FastISel lowers constexpr calls as indirect calls.
We'd like those to direct calls, and falling back to SelectionDAGISel
handles that.

Reviewers: dschuff, sunfish

Subscribers: jfb, sbc100, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D37073

llvm-svn: 311693
2017-08-24 19:53:44 +00:00
Heejin Ahn 34672faf49 [WebAssembly] Update GCC test suite failure expectations
Summary:
Update GCC test suite failure expectations as we add -O0 to the bare tests in
WebAssembly waterfall. There are still several untriaged lld failures.

Reviewers: sbc100, jgravelle-google, dschuff

Reviewed By: dschuff

Subscribers: jfb

Differential Revision: https://reviews.llvm.org/D37100

llvm-svn: 311691
2017-08-24 19:43:09 +00:00
Krzysztof Parzyszek c802d27a93 [Hexagon] Set access size for vector pseudo loads/stores
llvm-svn: 311690
2017-08-24 19:19:24 +00:00
Pete Couperus 2d1f6d67c5 [ARC] Add ARC backend.
Add the ARC backend as an experimental target to lib/Target.
Reviewed at: https://reviews.llvm.org/D36331

llvm-svn: 311667
2017-08-24 15:40:33 +00:00
Sjoerd Meijer b0eb5fb317 [AArch64] Add FMOVH0: materialize 0 using zero register for f16 values
Instead of loading 0 from a constant pool, it's of course much better to
materialize it using an fmov and the zero register.

Thanks to Ahmed Bougacha for the suggestion.

Differential Revision: https://reviews.llvm.org/D37102

llvm-svn: 311662
2017-08-24 14:47:06 +00:00