Commit Graph

21298 Commits

Author SHA1 Message Date
Petar Jovanovic f11daad18d [mips] Generate NMADD and NMSUB instructions when fneg node is present
This patch enables generation of NMADD and NMSUB instructions when fneg node
is present. These instructions are currently only generated if fsub node is
present.

Patch by Stanislav Ocovaj.

Differential Revision: https://reviews.llvm.org/D34507

llvm-svn: 311862
2017-08-27 21:07:24 +00:00
Craig Topper 80075a5fb7 [AVX512] Add more patterns for using masked moves for subvector extracts of the lowest subvector. This time with bitcasts between the vselect and the extract.
llvm-svn: 311856
2017-08-27 19:03:36 +00:00
Sanjay Patel a7a61d9768 [DAGCombiner] allow undef shuffle operands when eliminating bitcasts (PR34111)
As noted in the FIXME, this could be improved more, but this is the smallest fix
that helps:
https://bugs.llvm.org/show_bug.cgi?id=34111

llvm-svn: 311853
2017-08-27 17:29:30 +00:00
Sanjay Patel 4e4ba615b2 [x86] add haddps test for PR34111; NFC
llvm-svn: 311852
2017-08-27 17:15:49 +00:00
Jatin Bhateja 23eaf52d7d [X86] Adding more tests for horizontal [F]HADD/[F]SUB for AVX512 vectors types
llvm-svn: 311847
2017-08-27 12:43:25 +00:00
Craig Topper 36bd247f64 [X86] Add a target-specific DAG combine to combine extract_subvector from all zero/one build_vectors.
llvm-svn: 311841
2017-08-27 05:39:57 +00:00
Craig Topper a088362e88 [AVX512] Add patterns to match masked extract_subvector with bitcasts between the vselect and the extract_subvector. Remove the late DAG combine.
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.

Add some more test cases to ensure we've also got most of the zero masking covered too.

llvm-svn: 311837
2017-08-26 22:24:57 +00:00
Jatin Bhateja c2f41b9f0b [X86] Adding a test for horizontal [f]add/[f]sub for avx512 vector type 16x32.
Differential Revision: https://reviews.llvm.org/D37183

llvm-svn: 311834
2017-08-26 19:02:49 +00:00
Jatin Bhateja e4ca95d6aa [DAGCombiner] Extending pattern detection for vector shuffle.
Summary:
If all the operands of a BUILD_VECTOR extract elements from same vector then split the
vector efficiently based on the maximum vector access index.

This will also fix PR 33784

Reviewers: zvi, delena, RKSimon, thakis

Reviewed By: RKSimon

Subscribers: chandlerc, eladcohen, llvm-commits

Differential Revision: https://reviews.llvm.org/D35788

llvm-svn: 311833
2017-08-26 19:02:36 +00:00
Jatin Bhateja b60cfbefac Revert rL311247 : To rectify commit message.
Summary: This reverts commit rL311247.

Differential Revision: https://reviews.llvm.org/D36927

llvm-svn: 311832
2017-08-26 19:02:17 +00:00
Craig Topper d27386a9ed [AVX512] Add patterns to use masked moves to implement masked extract_subvector of the lowest subvector.
This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI.

llvm-svn: 311821
2017-08-25 23:34:59 +00:00
Craig Topper b89dbf0220 [AVX512] Add additional test cases for masked extract subvector.
This includes tests for extracting 128-bits from a 256-bit vector and zero masking.

llvm-svn: 311820
2017-08-25 23:34:57 +00:00
Craig Topper e81de105a5 [X86] Add patterns to show more failures to use TBM instructions when we're trying to check flags.
We can probably add patterns to fix some of them. But the ones that use 'and' as their root node emit a X86ISD::CMP node in front of the 'and' and then pattern matching that to 'test' instruction. We can't use a tablegen pattern to fix that because we can't remap the cmp result to the flag output of a TBM instruction.

llvm-svn: 311819
2017-08-25 23:34:55 +00:00
Chandler Carruth 4b611a896d [x86] Teach the backend to fold more read-modify-write memory operands
to instructions.

These can't be reasonably matched in tablegen due to the handling of
flags, so we have to do this in C++ code. We only did it for `inc` and
`dec` historically, this starts fleshing that out to more interesting
instructions. Notably, this handles transfering operands to `add` and
`sub`.

Currently this forces them into a register. The next patch will add
support for keeping immediate operands as immediates. Then I'll extend
this beyond just `add` and `sub`.

I'm not super thrilled by the repeated switches in the code but
everything else I tried was really ugly or problematic.

Many thanks to Craig Topper for the suggestions about where to even
begin here and how to make this stuff work.

Differential Revision: https://reviews.llvm.org/D37130

llvm-svn: 311806
2017-08-25 22:50:52 +00:00
Sanjay Patel 50a446ef10 [x86] regenerate checks; NFC
llvm-svn: 311793
2017-08-25 19:25:03 +00:00
Aditya Nandakumar 892979effc [GISel]: Implement widenScalar for Legalizing G_PHI
https://reviews.llvm.org/D37018

llvm-svn: 311763
2017-08-25 04:57:27 +00:00
Chandler Carruth 46259260c7 [x86] NFC - normalize test case formatting of IR and generate CHECK
lines with the script rather than using manually written checks.

llvm-svn: 311753
2017-08-25 02:32:51 +00:00
Craig Topper 355d8cff49 [X86] Add TBM instructions to X86InstrInfo::isDefConvertible.
This allows us to remove "test" instructions and use the flags from the TBM instructions directly.

llvm-svn: 311747
2017-08-25 01:59:06 +00:00
Chandler Carruth 5b491808f5 [x86] Back out one aspect of r311318: don't generically set
FeatureSlowUAMem32.

The idea was to mark things that are slow on widely available processors
as slow in the generic CPU so that the code generated for that CPU would
be fast across those processors. However, for this feature that doesn't
work out very well at all.

The problem here is that you can very easily enable AVX or AVX2 on top
of this generic CPU. For example, this can happen just by using AVX2
intrinsics from Clang within a region of code guarded by a dynamic CPU
feature test. When you do that, the generated code with SlowUAMem32 set
is ... amazingly slower. The problem is that there really aren't very
good alternatives to the unaligned loads, and so our vector codegen
regresses significantly.

The other issue is that there are plenty of AMD CPUs with AVX1 that
don't set FeatureSlowUAMem32 and so we shouldn't just check for AVX2
instead of this special feature. =/

It would be nice to have the target attriute logic be able to
enable/disable more than just one feature at a time and control this in
a more fine grained and useful way, but that doesn't seem easy. Given
that it is only Sandybridge and Ivybridge that set this feature, for now
I'm just backing it out of the generic CPU. That has the additional
advantage of going back to the previous state that people seemed vaguely
happy with.

llvm-svn: 311740
2017-08-25 00:56:05 +00:00
Chandler Carruth 8ac488b161 [x86] Fix an amazing goof in the handling of sub, or, and xor lowering.
The comment for this code indicated that it should work similar to our
handling of add lowering above: if we see uses of an instruction other
than flag usage and store usage, it tries to avoid the specialized
X86ISD::* nodes that are designed for flag+op modeling and emits an
explicit test.

Problem is, only the add case actually did this. In all the other cases,
the logic was incomplete and inverted. Any time the value was used by
a store, we bailed on the specialized X86ISD node. All of this appears
to have been historical where we had different logic here. =/

Turns out, we have quite a few patterns designed around these nodes. We
should actually form them. I fixed the code to match what we do for add,
and it has quite a positive effect just within some of our test cases.
The only thing close to a regression I see is using:

  notl %r
  testl %r, %r

instead of:

  xorl -1, %r

But we can add a pattern or something to fold that back out. The
improvements seem more than worth this.

I've also worked with Craig to update the comments to no longer be
actively contradicted by the code. =[ Some of this still remains
a mystery to both Craig and myself, but this seems like a large step in
the direction of consistency and slightly more accurate comments.

Many thanks to Craig for help figuring out this nasty stuff.

Differential Revision: https://reviews.llvm.org/D37096

llvm-svn: 311737
2017-08-25 00:34:07 +00:00
Sanjay Patel e404cbff66 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840

llvm-svn: 311731
2017-08-24 23:24:43 +00:00
Jacob Gravelle 690b76e13d [WebAssembly] FastISel : Bail to SelectionDAG for constexpr calls
Summary: Currently FastISel lowers constexpr calls as indirect calls.
We'd like those to direct calls, and falling back to SelectionDAGISel
handles that.

Reviewers: dschuff, sunfish

Subscribers: jfb, sbc100, llvm-commits, aheejin

Differential Revision: https://reviews.llvm.org/D37073

llvm-svn: 311693
2017-08-24 19:53:44 +00:00
Pete Couperus 2d1f6d67c5 [ARC] Add ARC backend.
Add the ARC backend as an experimental target to lib/Target.
Reviewed at: https://reviews.llvm.org/D36331

llvm-svn: 311667
2017-08-24 15:40:33 +00:00
Sjoerd Meijer b0eb5fb317 [AArch64] Add FMOVH0: materialize 0 using zero register for f16 values
Instead of loading 0 from a constant pool, it's of course much better to
materialize it using an fmov and the zero register.

Thanks to Ahmed Bougacha for the suggestion.

Differential Revision: https://reviews.llvm.org/D37102

llvm-svn: 311662
2017-08-24 14:47:06 +00:00
Chad Rosier bfd4014304 [TargetParser][AArch64] Add support for RDM feature in the target parser.
Differential Revision: https://reviews.llvm.org/D37081

llvm-svn: 311659
2017-08-24 14:30:44 +00:00
Michael Zuckerman 9ee61d9b00 Adding base lit test for x86interleaved
llvm-svn: 311658
2017-08-24 14:11:28 +00:00
Krzysztof Parzyszek c09a14eeb2 [Hexagon] Generate correct runtime check when recognizing memmove
The check (assuming positive stride) for validity of memmove should be
(a) the destination is at a lower address than the source, or
(b) the distance between the source and destination is greater than or
    equal the number of bytes copied.

For the second part it is sufficient to assume that the destination
is at a higher address, since the opposite case is covered by (a).
The distance calculation was previously done by subtracting the
pointers in the wrong order.

llvm-svn: 311650
2017-08-24 11:59:53 +00:00
Evgeny Astigeevich 540a39adf7 [ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes
ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes
for the Thumb1 target. This causes generation of redundant code and affects
performance.

This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106

Differential Revision: https://reviews.llvm.org/D36467

llvm-svn: 311649
2017-08-24 10:00:25 +00:00
Sjoerd Meijer afc2cd3c9e [AArch64] Custom lowering of copysign f16
This is a follow up patch of r311154 and introduces custom lowering of copysign
f16 to avoid promotions to single precision types when the subtarget supports
fullfp16.

Differential Revision: https://reviews.llvm.org/D36893

llvm-svn: 311646
2017-08-24 09:21:10 +00:00
Daniel Sanders 2c269f6bf8 Re-commit: [globalisel][tablegen] Add support for ImmLeaf without SDNodeXForm
Summary:
This patch adds support for predicates on imm nodes but only for ImmLeaf and not
for PatLeaf or PatFrag and only where the value does not need to be transformed
before being rendered into the instruction.

The limitation on PatLeaf/PatFrag/SDNodeXForm is due to differences in the
necessary target-supplied C++ for GlobalISel.

Depends on D36085

The previous commit was reverted for breaking the build but this appears to have
been the recurring problem on the Windows bots with tablegen not being re-run
when llvm-tblgen is changed but the .td's aren't. If it re-occurs then forcing a
build with clean=True should fix it but this string should do this in advance:
    Requires a clean build.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D36086

llvm-svn: 311645
2017-08-24 09:11:20 +00:00
Matt Arsenault d664315ae8 IPRA: Don't assume called function is first call operand
Fixes not finding the called global for AMDGPU
call pseudoinstructions, which prevented IPRA
from doing much.

llvm-svn: 311637
2017-08-24 07:55:15 +00:00
Chandler Carruth dc2556934c [x86] NFC: Clean up two tests and generate precise checks for them.
Mostly this involved giving unnamed values names and running the IR
through `opt` to re-format it but merging in any important comments in
the original. I then deleted pointless comments and inlined the function
attributes for ease of reading and editting.

All of this is to make it much easier to see the instructions being
generated here and evaluate any updates to the tests.

llvm-svn: 311634
2017-08-24 07:38:36 +00:00
Igor Breger 47be5fbbe9 [GlobalISel][X86] Support G_IMPLICIT_DEF.
Summary: Support G_IMPLICIT_DEF.

Reviewers: zvi, guyblank, t.p.northover

Reviewed By: guyblank

Subscribers: rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D36733

llvm-svn: 311633
2017-08-24 07:06:27 +00:00
Wei Ding a131d3fb29 Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
Differential Revision: http://reviews.llvm.org/D36335

llvm-svn: 311629
2017-08-24 04:18:24 +00:00
Hans Wennborg c39ec95d88 [DAG] Fix Node Replacement in PromoteIntBinOp
When one operand is a user of another in a promoted binary operation
we may replace and delete the returned value before returning
triggering an assertion. Reorder node replacements to prevent this.

Fixes PR34137.

Landing on behalf of Nirav.

Differential Revision: https://reviews.llvm.org/D36581

llvm-svn: 311623
2017-08-24 01:08:27 +00:00
Dylan McKay 4f5002198b [AVR] Use the correct register classes for 16-bit atomic operations
llvm-svn: 311620
2017-08-24 00:14:38 +00:00
Aditya Nandakumar efd8a84cd5 [GISEl]: Translate phi into G_PHI
G_PHI has the same semantics as PHI but also has types.
This lets us verify that the types in the G_PHI are consistent.
This also allows specifying legalization actions for G_PHIs.

https://reviews.llvm.org/D36990

llvm-svn: 311596
2017-08-23 20:45:48 +00:00
Reid Kleckner 6d353348e5 Parse and print DIExpressions inline to ease IR and MIR testing
Summary:
Most DIExpressions are empty or very simple. When they are complex, they
tend to be unique, so checking them inline is reasonable.

This also avoids the need for CodeGen passes to append to the
llvm.dbg.mir named md node.

See also PR22780, for making DIExpression not be an MDNode.

Reviewers: aprantl, dexonsmith, dblaikie

Subscribers: qcolombet, javed.absar, eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D37075

llvm-svn: 311594
2017-08-23 20:31:27 +00:00
Lei Huang 0cb591fc4c Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Differential Revision : https: // reviews.llvm.org/D32776

llvm-svn: 311588
2017-08-23 19:25:04 +00:00
Craig Topper 853a8d9ffc [AVX512] Don't create SHRUNKBLEND SDNodes for 512-bit vectors
There are no 512-bit blend instructions so we shouldn't create SHRUNKBLEND for them.

On a side note, it looks like there may be a missed opportunity for constant folding TESTM when LHS and RHS are equal.

This fixes PR34139.

Differential Revision: https://reviews.llvm.org/D36992

llvm-svn: 311572
2017-08-23 16:41:02 +00:00
Victor Leschuk 3697ebe25f Revert r311546 as it breaks build
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/4394

llvm-svn: 311560
2017-08-23 15:21:10 +00:00
Daniel Sanders c3885c4589 [globalisel][tablegen] Add support for ImmLeaf without SDNodeXForm
Summary:
This patch adds support for predicates on imm nodes but only for ImmLeaf and not for PatLeaf or PatFrag and only where the value does not need to be transformed before being rendered into the instruction.

The limitation on PatLeaf/PatFrag/SDNodeXForm is due to differences in the necessary target-supplied C++ for GlobalISel.

Depends on D36085

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar

Reviewed By: rovka

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D36086

llvm-svn: 311546
2017-08-23 12:14:18 +00:00
Florian Hahn 5b92960091 [ARM] Check for assembler instructions in test.
Currently this test causes test failures on some machines, due to isel not being registered. Update the test to run all passes and check emitted assembly instructions for now. 

llvm-svn: 311545
2017-08-23 11:53:24 +00:00
Florian Hahn 214e13d949 [ARM] Add missing patterns for insert_subvector.
Summary: In some cases, shufflevector instruction can be transformed involving insert_subvector instructions. The ARM backend was missing some insert_subvector patterns, causing a failure during instruction selection. AArch64 has similar patterns.

Reviewers: t.p.northover, olista01, javed.absar, rengolin

Reviewed By: javed.absar

Subscribers: aemerson, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36796

llvm-svn: 311543
2017-08-23 10:20:59 +00:00
Hiroshi Inoue cc555bd0ac [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
- recommitting after fixing a test failure on MacOS

On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757

llvm-svn: 311538
2017-08-23 08:55:18 +00:00
Hiroshi Inoue dbb285ca51 Revert rL311526: [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
This reverts commit rL311526 due to failures in some buildbot.

llvm-svn: 311530
2017-08-23 06:38:05 +00:00
Hiroshi Inoue c4449df1b0 [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757

llvm-svn: 311526
2017-08-23 05:15:15 +00:00
Dean Michael Berris 0884b73220 [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text
Summary:
This change achieves two things:

  - Redefine the Custom Event handling instrumentation points emitted by
    the compiler to not require dynamic relocation of references to the
    __xray_CustomEvent trampoline.

  - Remove the synthetic reference we emit at the end of a function that
    we used to keep auxiliary sections alive in favour of SHF_LINK_ORDER
    associated with the section where the function is defined.

To achieve the custom event handling change, we've had to introduce the
concept of sled versioning -- this will need to be supported by the
runtime to allow us to understand how to turn on/off the new version of
the custom event handling sleds. That change has to land first before we
change the way we write the sleds.

To remove the synthetic reference, we rely on a relatively new linker
feature that preserves the sections that are associated with each other.
This allows us to limit the effects on the .text section of ELF
binaries.

Because we're still using absolute references that are resolved at
runtime for the instrumentation map (and function index) maps, we mark
these sections write-able. In the future we can re-define the entries in
the map to use relative relocations instead that can be statically
determined by the linker. That change will be a bit more invasive so we
defer this for later.

Depends on D36816.

Reviewers: dblaikie, echristo, pcc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36615

llvm-svn: 311525
2017-08-23 04:49:41 +00:00
Yonghong Song dc1dbf6ef3 bpf: add variants of -mcpu=# and support for additional jmp insns
-mcpu=# will support:
  . generic: the default insn set
  . v1: insn set version 1, the same as generic
  . v2: insn set version 2, version 1 + additional jmp insns
  . probe: the compiler will probe the underlying kernel to
           decide proper version of insn set.

We did not not use -mcpu=native since llc/llvm will interpret -mcpu=native
as the underlying hardware architecture regardless of -march value.

Currently, only x86_64 supports -mcpu=probe. Other architecture will
silently revert to "generic".

Also added -mcpu=help to print available cpu parameters.
llvm will print out the information only if there are at least one
cpu and at least one feature. Add an unused dummy feature to
enable the printout.

Examples for usage:
$ llc -march=bpf -mcpu=v1 -filetype=asm t.ll
$ llc -march=bpf -mcpu=v2 -filetype=asm t.ll
$ llc -march=bpf -mcpu=generic -filetype=asm t.ll
$ llc -march=bpf -mcpu=probe -filetype=asm t.ll
$ llc -march=bpf -mcpu=v3 -filetype=asm t.ll
'v3' is not a recognized processor for this target (ignoring processor)
...
$ llc -march=bpf -mcpu=help -filetype=asm t.ll
Available CPUs for this target:

  generic - Select the generic processor.
  probe   - Select the probe processor.
  v1      - Select the v1 processor.
  v2      - Select the v2 processor.

Available features for this target:

  dummy - unused feature.

Use +feature to enable a feature, or -feature to disable it.
For example, llc -mcpu=mycpu -mattr=+feature1,-feature2
...

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 311522
2017-08-23 04:25:57 +00:00
Matthias Braun d6c0868da5 Fix tail-merge-after-mbp test
The output of this test changed after the fix in r311520 to have
-run-pass=block-placement behave like it does in a normal pipeline.
Adjust the test.

llvm-svn: 311521
2017-08-23 03:49:53 +00:00