Commit Graph

24537 Commits

Author SHA1 Message Date
Simon Pilgrim 2864b46469 [X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931)
This fixes a couple of BtVer2 missing instructions that weren't been handled in the override.

NOTE: There are still a lot of overrides that still need cleaning up!
llvm-svn: 331770
2018-05-08 14:55:16 +00:00
Simon Pilgrim 2580554333 [X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930)
I've created the necessary classes but there are still a lot of overrides that need cleaning up.

NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides.
llvm-svn: 331767
2018-05-08 13:51:45 +00:00
Simon Pilgrim b0a3be04ec [X86] Add vector masked load/store scheduler classes (PR32857)
Split off from existing vector load/store classes to remove InstRW overrides.

llvm-svn: 331760
2018-05-08 12:17:55 +00:00
Jeremy Morse 4f799c027e [X86] Mark all byval parameters as aliased
This is a fix for PR30290: by marking all byval stack slots as being aliased,
the instruction scheduler is more conservative about rescheduling memory
accesses to such stack slots as an LLVM Value* might alias it. This fixes
errors such as in the patched test case, where reads and writes to a data
structure are illegally mixed.

This could be fixed better in the future with better analysis for the
instruction scheduler to know what Values alias what stack slots.

Differential Revision: https://reviews.llvm.org/D45022

llvm-svn: 331749
2018-05-08 09:18:01 +00:00
Alexander Ivchenko c47f799289 [X86][CET] Shadow stack fix for setjmp/longjmp
This patch adds a shadow stack fix when compiling
setjmp/longjmp with the shadow stack enabled. This
allows setjmp/longjmp to work correctly with CET.

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46181

llvm-svn: 331748
2018-05-08 09:04:07 +00:00
Roman Tereshin d2421f9445 [MachineVerifier][GlobalISel] Verifying generic extends and truncates
Making sure we don't truncate / extend pointers, don't try to change
vector topology or bitcast vectors to scalars or back, and most
importantly, don't extend to a smaller type or truncate to a large
one.

Reviewers: qcolombet t.p.northover aditya_nandakumar

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46490

llvm-svn: 331718
2018-05-08 02:48:15 +00:00
Roman Tereshin 5e51fac39a [MIRParser][GlobalISel] Parsing vector pointer types (<M x pA>)
MIParser wasn't able to parse LLTs like `<4 x p0>`, fixing that.

Reviewers: qcolombet t.p.northover aditya_nandakumar

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46490

llvm-svn: 331712
2018-05-08 02:02:50 +00:00
Roman Tereshin 10bc5ad622 Follow Up on [MachineVerifier][GlobalISel] NFC, Improving MO printing and refactoring visitMachineInstrBefore
Fixing accidentally broken CodeGen/X86/verifier-generic-types-1.mir test

llvm-svn: 331695
2018-05-07 23:14:00 +00:00
Roman Tereshin d29fc89222 [MachineVerifier][GlobalISel] Checking that generic instrs have LLTs on all vregs
Every generic machine instruction must have generic virtual registers
only, that is, have a low-level type attached to each operand.

Previously MachineVerifier would catch a type missing on an operand
only if the previous operand for the the same type index exists and
have a type attached to it and it will report it as a type mismatch.
This is incosistent behaviour and a misleading error message.

This commit makes sure MachineVerifier explicitly checks that the
types are there for every operand and if not provides a
straightforward error message.

Reviewers: qcolombet t.p.northover bogner ab

Reviewed By: qcolombet

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46455

llvm-svn: 331694
2018-05-07 22:31:47 +00:00
Roman Tereshin f487edae49 [MachineVerifier][GlobalISel] NFC, Improving MO printing and refactoring visitMachineInstrBefore
This is an NFC pre-commit for the following "Checking that generic
instrs have LLTs on all vregs" commit.

This overloads MachineOperand::print to make it possible to print LLTs
with standalone machine operands.

This also overloads MachineVerifier::print(...MachineOperand...) with
an optional LLT using the newly introduced MachineOperand::print
variant; no actual calls added.

This also refactors MachineVerifier::visitMachineInstrBefore in the
parts dealing with all generic instructions (checking Selected
property, LLTs, and phys regs).

llvm-svn: 331693
2018-05-07 22:31:12 +00:00
Roman Lebedev 9bd6067db6 [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates
Summary:
Split off from D46031.

The previous patch, D46493, completely disabled unfolding in case of immediates.
But we can do better:
{F6120274} {F6120277}

https://rise4fun.com/Alive/xJS

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D46494

llvm-svn: 331685
2018-05-07 21:52:22 +00:00
Roman Lebedev cc42d08b1d [DagCombiner] Not all 'andn''s work with immediates.
Summary:
Split off from D46031.

In masked merge case, this degrades IPC by decreasing instruction count.
{F6108777}
The next patch should be able to recover and improve this.

This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.
{F6093591} {F6093592} {F6093593}
I'd say this regression is an improvement, since `IPC` increased in that case?

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: andreadb, llvm-commits, spatel

Differential Revision: https://reviews.llvm.org/D46493

llvm-svn: 331684
2018-05-07 21:52:11 +00:00
Simon Pilgrim 1233e1234a [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.

Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.

llvm-svn: 331672
2018-05-07 20:52:53 +00:00
Simon Pilgrim e480ed0b9f [X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.

Differential Revision: https://reviews.llvm.org/D46229

llvm-svn: 331659
2018-05-07 18:25:19 +00:00
Roman Lebedev 0412c90236 [DAGCombine][NFC] Masked merge unfolding: comment: some tests are non-canonical
As requested in https://reviews.llvm.org/D46494#inline-407282

llvm-svn: 331650
2018-05-07 16:42:47 +00:00
Simon Pilgrim 763bf12085 [X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases.
Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel.

llvm-svn: 331645
2018-05-07 16:34:26 +00:00
Simon Pilgrim ac5d0a31ef [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.

llvm-svn: 331643
2018-05-07 16:15:46 +00:00
Mark Searles 4a0f2c5047 [AMDGPU][Waitcnt] Remove the old waitcnt pass
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty

Differential Revision: https://reviews.llvm.org/D46448

llvm-svn: 331641
2018-05-07 14:43:28 +00:00
Petar Jovanovic cc4915701c Add option -verify-cfiinstrs to run verifier in CFIInstrInserter
Instead of enabling it for non NDEBUG builds, use -verify-cfiinstrs to
run verifier in CFIInstrInserter. It defaults to false.

Differential Revision: https://reviews.llvm.org/D46444

llvm-svn: 331635
2018-05-07 14:09:33 +00:00
Tim Renouf 18a1e9d03a [AMDGPU] Don't force WQM for DS op
Summary:
Previously, all DS ops forced WQM in a pixel shader. That was a hack to
allow for graphics frontends using ds_swizzle to implement explicit
derivatives, on SI/CI at least where DPP is not available. But it forced
WQM for _any_ DS op.

With this commit, DS ops no longer force WQM. Both graphics frontends
(Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm
intrinsic call when calculating explicit derivatives.

The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for
explicit derivatives".

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46051

Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81
llvm-svn: 331633
2018-05-07 13:21:26 +00:00
Simon Pilgrim f3ae50fca2 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
2018-05-07 11:50:44 +00:00
Jonas Paulsson ebb1605bf3 [SystemZ] Bugfix for MVCLoop CC clobbering.
MVCLoop clobbers CC (since it emits a compare/branch), but this was not
modelled.

Review: Ulrich Weigand
llvm-svn: 331627
2018-05-07 10:48:43 +00:00
Craig Topper cb2abc7977 [X86] Enable reciprocal estimates for v16f32 vectors by using VRCP14PS/VRSQRT14PS
Summary:
The legacy VRCPPS/VRSQRTPS instructions aren't available in 512-bit versions. The new increased precision versions are. So we can use those to implement v16f32 reciprocal estimates.

For KNL CPUs we can probably use VRCP28PS/VRSQRT28PS and avoid the NR step altogether, but I leave that for a future patch.

Reviewers: spatel

Reviewed By: spatel

Subscribers: RKSimon, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D46498

llvm-svn: 331606
2018-05-06 17:48:21 +00:00
Craig Topper b02e3dec4b [X86] Add test cases for reciprocal estimation for v16f32 vectors with AVX512F.
We should be able to use the vrsqrt14ps and vrcp14ps instructions for these cases.

llvm-svn: 331605
2018-05-06 17:45:40 +00:00
Amaury Sechet 0ea2936104 Add test cases for large integer legalization of add and sub. NFC
llvm-svn: 331604
2018-05-06 16:00:23 +00:00
Daniel Sanders acc008cb0c [globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC
As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the
-global-isel option is redundant when -run-pass is given. -global-isel sets up
the GlobalISel passes in the pass manager but -run-pass skips that entirely and
configures it's own pipeline.

llvm-svn: 331603
2018-05-05 21:19:59 +00:00
Daniel Sanders f84bc3793e [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Heejin Ahn c86da6bcfe [MIRPraser] Improve error checking for typed immediate operands
Summary:
This improves error checks for typed immediate operands introduced in
D45948 (rL331586), and removes a code block copied by mistake.

Reviewers: rtereshin

Subscribers: dschuff, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46491

llvm-svn: 331600
2018-05-05 20:53:23 +00:00
Roman Lebedev a3b0b59f54 [DAGCombiner] Masked merge: don't touch "not" xor's.
Summary:
Split off form D46031.

It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s.
In vector case, this breaks `andnpd` / `vandnps` patterns.

That being said, we may want to re-visit this `not` handling, maybe in D46073.

Reviewers: spatel, craig.topper, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46492

llvm-svn: 331595
2018-05-05 15:45:40 +00:00
Heejin Ahn c2ad096845 [MIRParser] Allow register class names in the form of integer/scalar
Summary:
The current code cannot handle register class names like 'i32', which is
a valid register class name in WebAssembly. This patch removes special
handling for integer/scalar/pointer type parsing and treats them as
normal identifiers.

Reviewers: thegameg

Subscribers: jfb, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D45948

llvm-svn: 331586
2018-05-05 07:05:51 +00:00
Michael Berg 2dcf12ffd4 Mapping SDNode flags to MachineInstr flags
Summary: Providing the glue to map SDNode fast math sub flags to MachineInstr fast math sub flags.

Reviewers: spatel, arsenm, wristow

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D46447

llvm-svn: 331567
2018-05-04 23:41:15 +00:00
Konstantin Zhuravlyov c2c2eb7d01 AMDGPU: Add D16 instructions preserve unused bits feature
- Predicate D16 patterns on this new feature
- Added this new feature to gfx900/2/4

Differential Revision: https://reviews.llvm.org/D46366

llvm-svn: 331551
2018-05-04 20:06:57 +00:00
Geoff Berry 8e4958e760 [MachineLICM] Debug intrinsics shouldn't affect hoist decisions
Summary:
When checking if an instruction stores to a given frame index, check
that the instruction can write to memory before looking at the memory
operands list to avoid e.g. DBG_VALUE instructions that reference a
frame index preventing a load from that index from being hoisted.

Reviewers: dblaikie, MatzeB, qcolombet, reames, javed.absar

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46284

llvm-svn: 331549
2018-05-04 19:25:09 +00:00
Michael Berg 7acc81b744 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Simon Pilgrim 0e51a125ea [X86] Add WriteEMMS scheduler class
Filled in the missing values from Btver2 SoG or Agner

llvm-svn: 331546
2018-05-04 18:16:13 +00:00
Simon Pilgrim d7ffbc5c7e [X86] Finish splitting WriteVecShift and WriteVecIMul to remove InstRW overrides.
llvm-svn: 331543
2018-05-04 17:47:46 +00:00
Adhemerval Zanella 6d56294c7f [AArch64] Add missing testcase for r331522
llvm-svn: 331541
2018-05-04 17:21:26 +00:00
Krzysztof Parzyszek effcc2fb79 [Hexagon] Handle non-immediate constants in HexagonSplitDouble
llvm-svn: 331527
2018-05-04 15:04:48 +00:00
Adhemerval Zanella a57ef17ab6 [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).

New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46009

llvm-svn: 331522
2018-05-04 14:33:55 +00:00
Krzysztof Parzyszek af73d2bdd9 [Hexagon] Skip reserved physical registers when updating liveness
llvm-svn: 331518
2018-05-04 13:59:05 +00:00
Jeremy Morse 07e8daa66b [X86] Add test case for PR30290s failing behaviour
Following the advice in review D45022, this currently tests for the broken llc
output where an instruction is mis-scheduled. This test is committed in advance
to improve the eventual fixing patch in D45022, making the bad behaviour that
that patch fixes clearer.

llvm-svn: 331514
2018-05-04 10:05:10 +00:00
Jeremy Morse 71f17bf855 Word wrap a test-file comment to 80 columns
This is a test commit to check whether my account works.

llvm-svn: 331512
2018-05-04 08:58:06 +00:00
Jonas Paulsson 72fe760592 [RegUsageInfoCollector] Bugfix for handling of register aliases.
Don't assume the alias of a defined reg is always already in the set.

As the test case in https://bugs.llvm.org/show_bug.cgi?id=36587 discovered,
it is wrong to assume that all the aliases of the defined register in the
*current function* is already present in the UsedPhysRegsMask.

This patch changes this so that any definition in the current function of a
phys-reg always results in all its aliases inserted into the set of defined
registers.

Review: Quentin Colombet
https://reviews.llvm.org/D45157

llvm-svn: 331509
2018-05-04 07:50:05 +00:00
Simon Pilgrim 0aed731516 [X86][Znver1] Use SchedAlias to tag microcoded scheduler classes
Avoids extra entries in the class tables.

Found a typo that missed the MMX_PHSUBSW instruction.

llvm-svn: 331488
2018-05-03 22:12:23 +00:00
Sanjay Patel 52151885e4 [PowerPC] add more FMF debug output; NFC
We can't see all of the problems currently unless
we look at debug output when the global 'unsafe' is
on. It's a mess. This is another attempt to make
sure that D45710 is not making changes unintentionally.

llvm-svn: 331476
2018-05-03 18:49:35 +00:00
Simon Pilgrim f2d2cedab4 [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.

llvm-svn: 331472
2018-05-03 17:56:43 +00:00
Sanjay Patel e7532d2940 [PowerPC] add tests for FMF propagation; NFC
I'm choosing PPC out of convenience because it does
all of the transforms of interest in these tests by
default. There are multiple FMF problems shown in the 
current checks. D45710 is proposing to fix part of 
that.

llvm-svn: 331471
2018-05-03 17:41:37 +00:00
Roman Lebedev 63ac19365a [CodeGen][X86][NFC] Copy two selectcc tests from AArch64.
These tests are for DAGCombiner::foldSelectCCToShiftAnd().
Right now, they were only tested for AArch64,
but given the upcoming X86 changes to the hasAndNot(),
the test coverage needs to be added.

These tests originated from D27489 / rL289738

llvm-svn: 331454
2018-05-03 13:33:07 +00:00
Simon Pilgrim f7dd6069a5 [X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classes
llvm-svn: 331453
2018-05-03 13:27:10 +00:00
Tim Northover 28e0a6f7dd ARM: don't try to over-align large vectors as arguments.
By default LLVM thinks very large vectors get aligned to their size when
passed across functions. Unfortunately no-one told the ARM backend so it
doesn't trigger stack realignment and so accesses can cause the usual
misalignment issues (e.g. a data abort).

This changes the ABI alignment to the stack alignment, which in practice
(and as a bonus) also coincides with the alignment "natural" vectors get.

llvm-svn: 331451
2018-05-03 12:54:25 +00:00
Piotr Padlewski 5dde809404 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

llvm-svn: 331448
2018-05-03 11:03:01 +00:00
Simon Pilgrim 93c878c76b [X86] Split WriteVecIMul/WriteVecPMULLD/WriteMPSAD/WritePSADBW into XMM and YMM/ZMM scheduler classes
Also retagged VDBPSADBW instructions as SchedWritePSADBW instead of SchedWriteVecIMul which matches the behaviour on SkylakeServer (the only thing that supports it...)

llvm-svn: 331445
2018-05-03 10:31:20 +00:00
Michael Berg 7d1b25d053 MachineInst support mapping SDNode fast math flags for support in Back End code generation
Summary:
Machine Instruction flags for fast math support and MIR print support


Reviewers: spatel, arsenm

Reviewed By: arsenm

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D45781

llvm-svn: 331417
2018-05-03 00:07:56 +00:00
Nemanja Ivanovic 01e2e79abf [PowerPC] Implement isMaskAndCmp0FoldingBeneficial
Sinking the and closer to a compare against zero is beneficial on PPC as it
allows us to emit record-form instructions. In the future, we may expand this
to a larger set of operations that feed compares against zero since PPC has
lots of record-form instructions.

Differential revision: https://reviews.llvm.org/D46060

llvm-svn: 331416
2018-05-02 23:55:23 +00:00
Nemanja Ivanovic 2139e99e47 [PowerPC] No CTR loop if the candidate exiting block is in a different loop
The CTR loops pass will insert the decrementing branch instruction in an exiting
block for the loop being transformed. However if that block is part of another
loop as well (whether a nested loop or with irreducible CFG), it is not valid
to use that exiting block. In fact, if the loop hass irreducible CFG, we don't
bother analyzing it and we just bail on the transformation. In practice, this
doesn't lead to a noticeable reduction in the number of loops transformed by
this pass.

Fixes https://bugs.llvm.org/show_bug.cgi?id=37229

Differential Revision: https://reviews.llvm.org/D46162

llvm-svn: 331410
2018-05-02 22:56:04 +00:00
Simon Pilgrim 350c22c587 [X86][SNB] Fix scheduling of MMX integer multiply instructions.
The entries were being bound to the wrong class.

llvm-svn: 331388
2018-05-02 19:26:14 +00:00
Simon Pilgrim 6732f6ea51 [X86] Split WriteShuffle/WriteVarShuffle + WriteBlend/WriteVarBlend into XMM and YMM/ZMM scheduler classes
llvm-svn: 331386
2018-05-02 18:48:23 +00:00
Farhana Aleen 07e612340f [AMDGPU] A trivial fix for a buildbot failure caused by "commit 224a839fcbbead221f872cd32a1dd0c308d37299".
Author: FarhanaAleen
llvm-svn: 331383
2018-05-02 18:16:39 +00:00
Simon Pilgrim 819f218f07 [X86] Cleanup WriteFShuffle/WriteFVarShuffle (+256 variants) scheduler classes with more common default values
llvm-svn: 331380
2018-05-02 17:58:50 +00:00
Farhana Aleen 150cb6d91a Revert "[AMDGPU] performAddCombine should run after DAG is legalized."
This reverts commit 6b97d2995566b4dddd6bf0d75579ff44501d4494.

llvm-svn: 331371
2018-05-02 16:48:52 +00:00
Farhana Aleen 2f4100f56e [AMDGPU] performAddCombine should run after DAG is legalized.
Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization
         in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with
         illegal types.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D46337

llvm-svn: 331368
2018-05-02 16:24:10 +00:00
Simon Pilgrim e93fd5f1e4 [X86] Cleanup WriteFAdd/WriteFCmp scheduler classes with more common default values
Intel models were targeting x87 instead of packed sse.

Also fixes XOP's VFRCZ to use WriteFAdd/WriteFAddY.

llvm-svn: 331340
2018-05-02 09:18:49 +00:00
Farhana Aleen e2dfe8a853 [AMDGPU] Support horizontal vectorization.
Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D46213

llvm-svn: 331313
2018-05-01 21:41:12 +00:00
Eli Friedman 763d161eda [AArch64] Add more tests for 64-bit immediate lowering.
This adds a some more tests, and adds some notes to tests which are using
a suboptimal lowering.

The constants with suboptimal lowerings seem to be relatively rare in
practice, but it might be a fun project to work on improvements.

llvm-svn: 331304
2018-05-01 20:00:14 +00:00
Vedant Kumar e23173b677 [DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N)
The logic for this combine is almost identical to the logic for a
(sext (sextload x)) combine.

This commit factors out the logic so it can be shared by both combines,
and corrects the SDLoc assigned in the zext version of the combine.

Prior to this patch, for the given test case, we would apply the
location associated with the udiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262

llvm-svn: 331303
2018-05-01 19:51:15 +00:00
Vedant Kumar d7117ed0f9 [DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N)
Prior to this patch, for the given test case, we would apply the
location associated with the sdiv instruction to instructions which
perform the load.

Part of: llvm.org/PR37262.

Differential Revision: https://reviews.llvm.org/D46222

llvm-svn: 331302
2018-05-01 19:51:15 +00:00
Vedant Kumar cc7b2a55c2 [DAGCombiner] Change the SDLoc on split extloads (2/N)
In DAGCombiner, we try to simplify this pattern:

  ([s|z]ext (load ...))

Conceptually, a new extload which is created while splitting the load
should have the same debug location as the load.

Making this change affects the IROrder of the new load, causing some
test case churn.

In practice, the new location is never different from the location of
the [s|z]ext, at least not during check-llvm or a stage2 build.

Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D46156

llvm-svn: 331301
2018-05-01 19:29:15 +00:00
Vedant Kumar ee4bfcaa5a [DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)
Setting the right SDLoc on a newly-created zextload fixes a line table
bug which resulted in non-linear stepping behavior.

Several backend tests contained CHECK lines which relied on the IROrder
inherited from the wrong SDLoc. This patch breaks that dependence where
feasbile and regenerates test cases where not.

In some cases, changing a node's IROrder may alter register allocation
and spill behavior. This can affect performance. I have chosen not to
prevent this by applying a "known good" IROrder to SDLocs, as this may
hide a more general bug in the scheduler, or cause regressions on other
test inputs.

rdar://33755881, Part of: llvm.org/PR37262

Differential Revision: https://reviews.llvm.org/D45995

llvm-svn: 331300
2018-05-01 19:26:15 +00:00
Konstantin Zhuravlyov 1501af4846 AMDGPU: Remove remnants of gfx901 (it was deprecated some time ago)
llvm-svn: 331298
2018-05-01 18:47:48 +00:00
Simon Pilgrim 21caf0124f [X86] Split WriteFMul/WriteFDiv into XMM and YMM/ZMM scheduler classes
llvm-svn: 331293
2018-05-01 18:22:53 +00:00
Simon Pilgrim 5269167f5b [X86] Split WriteFAdd into XMM and YMM/ZMM scheduler classes
Removes more WriteFAdd InstRW overrides

llvm-svn: 331276
2018-05-01 16:13:42 +00:00
Sanjay Patel 5727011fd5 [DAG] add test to show FMF mismatch between IR and DAG; NFC
D45710 proposes to change this, but we have no test coverage 
for the first step in this process.

llvm-svn: 331271
2018-05-01 15:43:36 +00:00
Simon Pilgrim dd8eae128b [X86] Split WriteFShuffle into XMM and YMM/ZMM scheduler classes
Removes more WriteFShuffle InstRW overrides

llvm-svn: 331264
2018-05-01 14:25:01 +00:00
Simon Dardis 3d562fb975 Reland r331175: "[mips] Fix the predicates of jump and branch and link instructions"
The previous version of this patch restricted the 'jal' instruction to MIPS and
microMIPSr3. microMIPS32r6 does not have this instruction and instead uses jal
as an alias for balc.

Original commit message:
> Reviewers: smaksimovic, atanasyan, abeserminji
>
> Differential Revision: https://reviews.llvm.org/D46114
>

llvm-svn: 331259
2018-05-01 13:06:49 +00:00
Simon Pilgrim 57f2b185ac [X86] Split WriteVecLogic into XMM and YMM/ZMM scheduler classes
This removes all the WriteVecLogic InstRW overrides.

llvm-svn: 331258
2018-05-01 12:39:17 +00:00
Andrea Di Biagio d4c58400c5 [X86] Correct spill slot size.
This patch fixes a bug introduced by revision 330778 (originally reviewed at:
https://reviews.llvm.org/D44782), where function isFrameLoadOpcode returned
the wrong number of bytes read for opcodes VMOVSSrm and VMOVSDrm.

This corrects that mistake, and extends the regression test to catch cases where
the dead stores should be removed.

Patch by Jeremy Morse.

Differential Revision: https://reviews.llvm.org/D46256

llvm-svn: 331252
2018-05-01 10:29:38 +00:00
Gabor Buella c8ded04e85 [X86] movdiri and movdir64b instructions
Reviewers: spatel, craig.topper, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D45983

llvm-svn: 331248
2018-05-01 10:01:16 +00:00
Krzysztof Parzyszek 1cf329c933 [LivePhysRegs] Remove registers clobbered by regmasks from the live set
Dead defs were being removed from the live set (in stepForward), but
registers clobbered by regmasks weren't (more specifically, they were
actually removed by removeRegsInMask, but then they were added back in).

llvm-svn: 331219
2018-04-30 19:38:47 +00:00
Matt Arsenault 0084adc516 AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331215
2018-04-30 19:08:16 +00:00
Roman Tereshin 46f838f370 [MIR] Reset unique MBB numbering in MachineFunction::reset()
No need to waste space nor number MBBs differently if MF gets recreated.

Reviewers: qcolombet, stoklund, t.p.northover, bogner, javed.absar

Reviewed By: qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46078

llvm-svn: 331213
2018-04-30 18:58:57 +00:00
Sanjay Patel 1babf5ff32 [DAGCombiner] rename function attribute for disabling ftrunc transform
This is the matching name change for the Clang patch at:
D46236
rL331209

Differential Revision: https://reviews.llvm.org/D46237 

llvm-svn: 331210
2018-04-30 18:20:33 +00:00
Ulrich Weigand c3ec80fea1 [SystemZ] Handle SADDO et.al. and ADD/SUBCARRY
This provides an optimized implementation of SADDO/SSUBO/UADDO/USUBO
as well as ADDCARRY/SUBCARRY on top of the new CC implementation.

In particular, multi-word arithmetic now uses UADDO/ADDCARRY instead
of the old ADDC/ADDE logic, which means we no longer need to use
"glue" links for those instructions.  This also allows making full
use of the memory-based instructions like ALSI, which couldn't be
recognized due to limitations in the DAG matcher previously.

Also, the llvm.sadd.with.overflow et.al. intrinsincs now expand to
directly using the ADD instructions and checking for a CC 3 result.

llvm-svn: 331203
2018-04-30 17:54:28 +00:00
Ulrich Weigand b32f3656d2 [SystemZ] Do not use glue to represent condition code dependencies
Currently, an instruction setting the condition code is linked to
the instruction using the condition code via a "glue" link in the
SelectionDAG.  This has a number of drawbacks; in particular, it
means the same CC cannot be used by multiple users.  It also makes
it more difficult to efficiently implement SADDO et. al.

This patch changes the back-end to represent CC dependencies as
normal values during SelectionDAG matching, along the lines of
how this is handled in the X86 back-end already.

In addition to the core mechanics of updating all relevant patterns,
this requires a number of additional changes:

- We now need to be able to spill/restore a CC value into a GPR
  if necessary.  This means providing a copyPhysReg implementation
  for moves involving CC, and defining getCrossCopyRegClass.

- Since we still prefer to avoid such spills, we provide an override
  for IsProfitableToFold to avoid creating a merged LOAD / ICMP if
  this would result in multiple users of the CC.

- combineCCMask no longer requires a single CC user, and no longer
  need to be careful about preventing invalid glue/chain cycles.

- emitSelect needs to be more careful in marking CC live-in to
  the basic block it generates.  Also, we can now optimize the
  case of multiple subsequent selects with the same condition
  just like X86 does.

llvm-svn: 331202
2018-04-30 17:52:32 +00:00
Daniel Sanders 2de9d4ad5d Fix infinite loop after r331115
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.

llvm-svn: 331201
2018-04-30 17:20:01 +00:00
Ulrich Weigand fb56686cd3 [SystemZ] Improve handling of Select pseudo-instructions
If we have LOCR instructions, select them directly from SelectionDAG
instead of first going through a pseudo instruction and then using
the custom inserter to emit the LOCR.

Provide Select pseudo-instructions for VR32/VR64 if we have vector
instructions, to avoid having to go through the first 16 FPRs
unnecessarily.

If we do not have LOCFHR, prefer using LOCR followed by a move
over a conditional branch.

llvm-svn: 331191
2018-04-30 15:49:27 +00:00
Simon Dardis 5a512d63c9 Revert "[mips] Fix the predicates of jump and branch and link instructions"
That commit broke one of the LLD builders, reverting while I investigate.

This patch reverts r331175.

llvm-svn: 331178
2018-04-30 14:03:35 +00:00
Simon Dardis cc95a9c557 [mips] Fix the predicates of jump and branch and link instructions
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D46114

llvm-svn: 331175
2018-04-30 13:37:42 +00:00
Simon Dardis 57c2095d1b [mips] Fix microMIPS loads and stores.
Previously these instructions were unselectable and instead were generated
through the instruction mapping tables.

Reviewers: atanasyan, smaksimovic, abeserminji

Differential Revision: https://reviews.llvm.org/D46055

llvm-svn: 331165
2018-04-30 09:44:44 +00:00
Daniel Sanders 5eb9f581b6 [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Jessica Paquette 0b6724917a [MachineOutliner] Add defs to calls + don't track liveness on outlined functions
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.

Outlined calls shouldn't break liveness assumptions that someone might make.

This also un-XFAILs the noredzone test, and updates the calls test.

llvm-svn: 331095
2018-04-27 23:36:35 +00:00
Heejin Ahn d20d0648ed [DAGCombiner] Fix a case of 1 in non-splat vector pow2 divisor
Summary:
D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector
dividers. But when there is a 1 in a vector, the instruction sequence to
be generated involves shifting a value by the number of its bit widths,
which is undefined
(c64f4dbfe3/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L6000-L6006)).

Especially, in architectures that do not support vector instructions,
each of element in a vector will be computed separately using scalar
operations, and then the resulting value will be undef for '1' values
in a vector.

(All 1's vector is fine; only vectors mixed with 1 and others will be
affected.)

Reviewers: RKSimon, jgravelle-google

Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D46161

llvm-svn: 331092
2018-04-27 22:23:11 +00:00
Craig Topper d656410293 [X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block.
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.

To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.

Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.

I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.

Reviewers: chandlerc, RKSimon, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46202

llvm-svn: 331091
2018-04-27 22:15:33 +00:00
Jun Bum Lim 9e3e14b5f9 [PostRASink] extend the live-in check for all aliased registers
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.

llvm-svn: 331072
2018-04-27 19:59:20 +00:00
Daniel Sanders 27fe8a5011 [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

llvm-svn: 331071
2018-04-27 19:48:53 +00:00
Mark Searles a6322924e6 [AMDGPU][Waitcnt] Update a few tests to use default waitcnt pass (si-insert-waitcnts) rather than old pass (si-insert-waits); this is a small step towards the overall goal of removing the old waitcnt pass, which is no longer maintained.
Differential Revision: https://reviews.llvm.org/D46154

llvm-svn: 331062
2018-04-27 17:59:15 +00:00
Simon Pilgrim b2aa89c909 [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.

llvm-svn: 331051
2018-04-27 15:50:33 +00:00
Simon Dardis e3c3c5a7a7 [mips] Analyze and provide selection patterns microMIPSR6 branches
These branches were previously unanalyzable and unselectable. Add them and
recognize how to generate their inverses.

Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D46113

llvm-svn: 331050
2018-04-27 15:49:49 +00:00
Francis Visoiu Mistrih c855e92ca9 [AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.

This didn't work in case no frame was needed and resulted in code size
regressions.

llvm-svn: 331044
2018-04-27 15:30:54 +00:00
Oliver Stannard 76088a5929 [AArch64] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.

Differential revisioon: https://reviews.llvm.org/D46107

llvm-svn: 331036
2018-04-27 13:45:32 +00:00
Aleksandar Beserminji 3546c1603a [mips] Fix how compiler fuse instructions to fmadd/fmsub
This patch makes compiler does not fuse fmul and fadd/fsub into
fmadd/fmsub by default. Instead, -fp-contract=fast option can
be used when such behavior is desired.

Differential Revision: https://reviews.llvm.org/D46057

llvm-svn: 331033
2018-04-27 13:30:27 +00:00
Oliver Stannard f3632143da [ARM] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.

Differential revision: https://reviews.llvm.org/D46106

llvm-svn: 331032
2018-04-27 12:50:40 +00:00
Alex Bradbury f5800a2aa0 [RISCV] Add remat.ll test case
This test case demonstrates suboptimal codegen due to the fact that simple 
constants aren't recognised as rematerialisable.

llvm-svn: 331028
2018-04-27 11:50:30 +00:00
David Green c4cccea4c9 [ARM] Enable misched for R52.
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.

llvm-svn: 331027
2018-04-27 11:29:49 +00:00
Eli Friedman da018e5687 [MachineOutliner] Don't outline from functions with a section marking.
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.

I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.

Differential Revision: https://reviews.llvm.org/D46091

llvm-svn: 331007
2018-04-27 00:21:34 +00:00
Chandler Carruth 16429acacb [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics
The LLVM commit introduces a crash in LLVM's instruction selection.

I filed http://llvm.org/PR37260 with the test case.

llvm-svn: 330997
2018-04-26 21:46:01 +00:00
Matt Arsenault 540512c297 DAG: Fix not legalizing vector fcanonicalizes
If an fcanoncialize was done on a vector type that was legal,

llvm-svn: 330981
2018-04-26 19:21:37 +00:00
Matt Arsenault fcc5ba46b7 AMDGPU: Extend extract_vector_elt fneg combine to fabs
Fixes a regression in a future commit.

llvm-svn: 330980
2018-04-26 19:21:32 +00:00
Geoff Berry 08ab8c9544 [AArch64] Fix scavenged spill slot base when stack realignment required.
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.

Based on problem and patch reported by @paulwalker-arm

This is an alternative to solution proposed in D45770

Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar

Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46063

llvm-svn: 330976
2018-04-26 18:50:45 +00:00
Mark Searles 2a19af6e17 [AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account.
Differential Revision: https://reviews.llvm.org/D46067

llvm-svn: 330954
2018-04-26 16:11:19 +00:00
Sanjay Patel 5a90285bd9 [DAGCombiner] limit ftrunc optimizations with function attribute
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437

llvm-svn: 330951
2018-04-26 16:04:44 +00:00
Sanjay Patel 99a5f396d4 [x86] add tests to show potential opt-out of ftrunc optimization; NFC
This is another preliminary step for disabling this transform as 
discussed in the post-commit thread for:
rL330437
I'm using one of the names suggested there for the attribute, but 
we can fix that up as needed once the clang side of this is sorted 
out. 

llvm-svn: 330950
2018-04-26 15:36:15 +00:00
Benjamin Kramer 7dd437710e [NVPTX] Make the legalizer expand shufflevector of <2 x half>
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.

Differential Revision: https://reviews.llvm.org/D46116

llvm-svn: 330948
2018-04-26 15:26:29 +00:00
Alex Bradbury 15e894baee [RISCV] Implement isZextFree
This returns true for 8-bit and 16-bit loads, allowing LBU/LHU to be selected
and avoiding unnecessary masks.

llvm-svn: 330943
2018-04-26 14:04:18 +00:00
Alex Bradbury e74f519241 [RISCV] Add test case showing suboptimal codegen when loading unsigned char/short
Implementing isZextFree will allow lbu or lhu to be selected rather than 
lb+mask and lh+mask.

llvm-svn: 330942
2018-04-26 14:00:35 +00:00
Lama Saba a331f91853 [X86] Fix Update Kill Register in Avoid SFB Pass - Bug 37153
Differential Revision: https://reviews.llvm.org/D45823

Change-Id: Icf6f34f6babc3cb2ff5292fde003472473037a71
llvm-svn: 330939
2018-04-26 13:16:11 +00:00
Alex Bradbury 5c41ecedf8 [RISCV] Implement isLegalAddImmediate
This causes a trivial improvement in the recently added lsr-legaladdimm.ll 
test case.

llvm-svn: 330937
2018-04-26 13:00:37 +00:00
Alex Bradbury c2f78f80da [RISCV] Add test/CodeGen/RISCV/lsr-legaladdimm.ll
Add a test case which will show a codegen difference upon the implementation 
of a target-specific isLegalAddImmediate.

llvm-svn: 330936
2018-04-26 12:57:29 +00:00
Chandler Carruth eb631ef51e [x86] Allow folding unaligned memory operands into pcmp[ei]str*
instructions.

These have special permission according to the x86 manual to read
unaligned memory, and this folding is done by ICC and GCC as well.

This corrects one of the issues identified in PR37246.

llvm-svn: 330896
2018-04-26 03:17:25 +00:00
Chandler Carruth 8cc8c0a87c [x86] NFC: Add tests for idiomatic usage patterns of SSE4.2 string
comparison instructions (pcmp[ei]stri*).

These will help show improvements from fixes to PR37246.

I've not really covered the mask forms of this intrinsic as I don't have
as good of an intuition about the likely usage patterns there. Happy for
someone to extend this with tests covering the mask form.

llvm-svn: 330895
2018-04-26 03:12:17 +00:00
Mark Searles ec58183e1b [AMDGPU] Waitcnt pass: add debug options
- Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

- Add debug counters to control force emit of s_waitcnt instrs; debug counters:
si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs
si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs
si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs

- Add some debug statements

Note that a variant of this patch was previously committed/reverted.

Differential Revision: https://reviews.llvm.org/D45888

llvm-svn: 330862
2018-04-25 19:21:26 +00:00
Francis Visoiu Mistrih 57fcd3454a [MIR] Add support for debug metadata for fixed stack objects
Debug var, expr and loc were only supported for non-fixed stack objects.

This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:

* debug-info-variable
* debug-info-expression
* debug-info-location

Differential Revision: https://reviews.llvm.org/D46032

llvm-svn: 330859
2018-04-25 18:58:06 +00:00
Craig Topper 300e20d61c [X86] Form MUL_IMM for multiplies with 3/5/9 to encourage LEA formation over load folding.
Previously we only formed MUL_IMM when we split a constant. This blocked load folding on those cases. We should also form MUL_IMM for 3/5/9 to favor LEA over load folding.

Differential Revision: https://reviews.llvm.org/D46040

llvm-svn: 330850
2018-04-25 17:35:03 +00:00
Amara Emerson 1f5d994119 [AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic.
rdar://38674040

llvm-svn: 330831
2018-04-25 14:43:59 +00:00
Shiva Chen d58bd8dc4a [RISCV] Expand function call to "call" pseudoinstruction
To do this:
1. Change GlobalAddress SDNode to TargetGlobalAddress to avoid legalizer
   split the symbol.

2. Change ExternalSymbol SDNode to TargetExternalSymbol to avoid legalizer
   split the symbol.

3. Let PseudoCALL match direct call with target operand TargetGlobalAddress
   and TargetExternalSymbol.

Differential Revision: https://reviews.llvm.org/D44885

llvm-svn: 330827
2018-04-25 14:19:12 +00:00
Simon Dardis 0f2f5976d0 [mips] Teach the delay slot filler to transform 'jal' for microMIPS
ISel is currently picking 'JAL' over 'JAL_MM' for calling a function when
targeting microMIPS. A later patch will correct this behaviour.

This patch extends the mechanism for transforming instructions into their short
delay to recognise 'JAL_MM' for transforming into 'JALS_MM'.

llvm-svn: 330825
2018-04-25 14:12:57 +00:00
Roman Lebedev cfa9e58ccf [X86][AArch64][NFC] Finish adding 'bad' tests for masked merge unfolding with constants.
I have initially committed basic tests in, rL330771,
but then quickly discovered that there are a few more
interesting patterns.

llvm-svn: 330819
2018-04-25 12:48:23 +00:00
Alexander Timofeev b934728cd2 [AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556)
llvm-svn: 330818
2018-04-25 12:32:46 +00:00
Craig Topper bba52806b1 [X86] Auto-generate complete checks. NFC
llvm-svn: 330797
2018-04-25 03:40:45 +00:00
Jessica Paquette 4f56428de1 [MachineOutliner] Check for explicit uses of LR/W30 in MI operands
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.

This also adds a test that ensures that those sorts of ADRPs won't be outlined.

llvm-svn: 330783
2018-04-24 22:38:15 +00:00
Craig Topper f3cefad255 [DAGCombiner][X86] When promoting loads don't use ZEXTLOAD even its legal
We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets.

Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue.

This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here.

Differential Revision: https://reviews.llvm.org/D45585#inline-402825

llvm-svn: 330781
2018-04-24 22:35:27 +00:00
Warren Ristow b960d2cb40 [X86] Account for partial stack slot spills (PR30821)
Previously, _any_ store or load instruction was considered to be
operating on a spill if it had a frameindex as an operand, and thus
was fair game for optimisations such as "StackSlotColoring". This
usually works, except on architectures where spills can be partially
restored, for example on X86 where a spilt vector can have a single
component loaded (zeroing the rest of the target register). This can be
mis-interpreted and the zero extension unsoundly eliminated, see
pr30821.

To avoid this, this commit optionally provides the caller to
isLoadFromStackSlot and isStoreToStackSlot with the number of bytes
spilt/loaded by the given instruction. Optimisations can then determine
that a full spill followed by a partial load (or vice versa), for
example, cannot necessarily be commuted.

Patch by Jeremy Morse!

Differential Revision: https://reviews.llvm.org/D44782

llvm-svn: 330778
2018-04-24 22:01:50 +00:00
Tom Stellard a2be8f4c35 AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsic
Summary: This is no longer used by mesa since its 18.0.0 release.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D45988

llvm-svn: 330775
2018-04-24 21:37:57 +00:00
Tom Stellard 257882ff72 AMDGPU/GlobalISel: Fall-back to SelectionDAG for non-void functions
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45843

llvm-svn: 330774
2018-04-24 21:29:36 +00:00
Roman Lebedev 54df09e8fe [X86][AArch64][NFC] Add tests for masked merge unfolding with %y = const
The fold was added in D45733.

This appears to be a regression.

llvm-svn: 330771
2018-04-24 21:23:22 +00:00
Tom Stellard c7709e1c29 AMDGPU/GlobalISel: Add support for amdgpu_ps calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45837

llvm-svn: 330767
2018-04-24 20:51:28 +00:00
Vedant Kumar 4ce143088c [test] Update llc checks for CodeGen/X86/avg.ll
The output of update_llc_test_checks.py on this test file has changed,
so the test file should be updated to minimize source changes in future
patches.

The test updates for this file appear to be limited to relaxations of
the form:

  -; SSE2-NEXT:    movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
  +; SSE2-NEXT:    movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill

This was suggested in https://reviews.llvm.org/D45995.

llvm-svn: 330758
2018-04-24 19:20:18 +00:00
Stanislav Mekhanoshin a4bfb3c446 [AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.

Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.

Differential Revision: https://reviews.llvm.org/D45987

llvm-svn: 330752
2018-04-24 18:17:55 +00:00
Simon Pilgrim 81cb67ad82 [XOP] v4i32 IFMA 'VPMACS' instructions should use the WritePMULLD schedule class
llvm-svn: 330751
2018-04-24 18:13:57 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim 828ef9e013 [X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies
These are stores, not loads, so don't need to account for load latency.

llvm-svn: 330735
2018-04-24 16:26:51 +00:00
Simon Pilgrim 23d29250ae [X86] Fix missing cfi from sitofp checks
llvm-svn: 330715
2018-04-24 13:24:56 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Alexander Ivchenko 5717fbaf4c [X86] Replace action Promote with Expand for operation ISD::SINT_TO_FP
Summary:
If attribute "use-soft-float"="true" is set then X86ISelLowering.cpp sets
'Promote' action for ISD::SINT_TO_FP operation on type i32.

But 'Promote' action is not proper in this case since lib function
__floatsidf is available for casting from signed int to float type.
Thus Expand action is more suitable here.

The Expand action should be set for ISD::UINT_TO_FP for soft float as well.

If function attribute "use-soft-float"="true" is set then infinite looping
can happen in DAG combining, function visitSINT_TO_FP() replaces SINT_TO_FP
node with UINT_TO_FP node and function combineUIntToFP() replace vice versa in cycle.
The fix prevents it.

Patch by vrybalov

Differential Revision: https://reviews.llvm.org/D45572

llvm-svn: 330711
2018-04-24 12:57:51 +00:00
Petar Jovanovic e2bfcd6394 Correct dwarf unwind information in function epilogue
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

* CFI instructions do not affect code generation (they are not counted as
  instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Added CFIInstrInserter pass:

* analyzes each basic block to determine cfa offset and register are valid
  at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D42848

llvm-svn: 330706
2018-04-24 10:32:08 +00:00
Simon Dardis fce722e6f8 [mips] Correct the patterns for bswap
Guard the MIPS64 variant correctly for i64, mark the MIPS32 version as not
in microMIPS and provide the microMIPS version.

Additionally, remove a related stale XFAIL'd test as bswap has its own test
case providing coverage.

Reviewers: smaksimovic, abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D45816

llvm-svn: 330705
2018-04-24 10:19:29 +00:00
Andrei Elovikov 822602a75e [CodeGen] Do not allow opt-bisect-limit to skip ScalarizeMaskedMemIntrin.
Summary:
The pass is supposed to scalarize such intrinsics if the target does not support
them natively, so if the scalarization does not happen instruction selection
crashes due to inability to lower these intrinsics.

Reviewers: andrew.w.kaylor, craig.topper

Reviewed By: andrew.w.kaylor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45947

llvm-svn: 330700
2018-04-24 09:24:29 +00:00
Roman Tereshin 3c6ea7e28c [GlobalISel][Legalizer] Look thro copies while combining G_UNMERGE's
As we're becoming stricter w/ respect to not allowing vregs having LLTs
and regclasses assigned both mid-globalisel pipeline, the number of
extra copies grows, some of which separate G_UNMERGE's from their
corresponding G_MERGE's, becoming a performance concern.

It's worth mentioning that we're already looking through copies while
combining legalization artifacts for every kind of artifact but
G_UNMERGE.

Reviewed By: aditya_nandakumar

Reviewers: ab, t.p.northover, volkan, javed.absar

Subscribers: rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45644

llvm-svn: 330660
2018-04-23 22:28:36 +00:00
Roman Lebedev 95c6eaf530 [DAGCombiner] Unfold scalar masked merge if profitable
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
So we need to make sure that they are still generated.

If the mask is constant, we do nothing. InstCombine should have unfolded it.
Else, i use `hasAndNot()` TLI hook.

For now, only handle scalars.

https://rise4fun.com/Alive/bO6

----

I *really* don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`.
It is super fragile. Is there something like IR Pattern Matchers for this?

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D45733

llvm-svn: 330646
2018-04-23 20:38:49 +00:00
Roman Lebedev bf18cc56d3 [X86][AArch64][NFC] Add tests for masked merge unfolding
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).
I'm guessing `llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp` should be able to unfold this.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45563

llvm-svn: 330645
2018-04-23 20:38:42 +00:00
Gabor Buella 1a2ce572bf [X86] Revert r330638 - accidental commit
llvm-svn: 330640
2018-04-23 20:05:51 +00:00
Gabor Buella 213a7cda1f [X86] movdiri and movdir64b instructions
Reviewers: craig.topper
llvm-svn: 330638
2018-04-23 20:00:59 +00:00
Peter Collingbourne 5ab4a4793e Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure.
This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

Original commit message:
> This is a code size win in code that takes offseted addresses
> frequently, such as C++ constructors that typically need to compute
> an offseted address of a vtable. This reduces the size of Chromium
> for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 330630
2018-04-23 19:09:34 +00:00
Vedant Kumar f17720633b [SelectionDAG] Dump debug locs in SDNodes
This helps debug issues where selection-dag assigns the wrong location
to an instruction.

Differential Revision: https://reviews.llvm.org/D45913

llvm-svn: 330618
2018-04-23 17:18:24 +00:00
Matt Arsenault b21f9592be AMDGPU: Move a flawed assert when spilling SGPRs
It's possible to validly spill the frame offset register
in a call sequence to a VGPR. There are definitely issues
with SGPR spilling to memory, so move the assert later.

llvm-svn: 330612
2018-04-23 16:13:30 +00:00
Nicolai Haehnle 5a995664f0 AMDGPU: Fix a corner case crash in SIOptimizeExecMasking
Summary:
See the new test case; this is really unlikely to happen with real code,
but I ran into this while attempting to bugpoint-reduce a different issue.

Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45885

llvm-svn: 330585
2018-04-23 13:05:50 +00:00
Simon Pilgrim 06e16541ba [X86] Remove unnecessary WriteFBlend/WriteBlend InstRW overrides.
Fixed a lot of the default classes which were being completely overridden.

llvm-svn: 330554
2018-04-22 18:35:53 +00:00
Simon Pilgrim 091680b6e7 [X86] Remove unnecessary WriteFMul/WriteFRcp/WriteFRsqrt InstRW overrides.
llvm-svn: 330553
2018-04-22 18:09:50 +00:00
Simon Pilgrim ef8d3ae4b5 [X86] Fix (completely overridden) WriteFHAdd/WritePHAdd classes to allow us to remove unnecessary instrw overrides.
llvm-svn: 330546
2018-04-22 15:25:59 +00:00
Simon Pilgrim 2fd8269c6f [X86][MMX][SSE] Tag missed PHADD/PHSUB instructions with WritePHAdd
llvm-svn: 330545
2018-04-22 15:02:23 +00:00
Simon Pilgrim 96855ec39e [X86] Remove unnecessary WriteFVarBlend/WriteVarBlend InstRW overrides.
This also fixes some of the ReadAfterLd issues due to InstRW.

llvm-svn: 330544
2018-04-22 14:43:12 +00:00
Simon Pilgrim a41ae2f005 [X86] Fix WriteMPSAD/WritePSADBW values to allow us to remove unnecessary instrw overrides.
llvm-svn: 330542
2018-04-22 10:39:16 +00:00
Craig Topper fe59bea07b [X86] Add DAG combine to turn (trunc (srl (mul ext, ext), 16) into PMULHW/PMULHUW.
Ultimately I want to use this to remove the intrinsics for these instructions.

llvm-svn: 330520
2018-04-21 18:39:21 +00:00
Craig Topper 1b223e75da [X86] Add test cases that show the current codegen for (trunc (srl (mul ext, ext), 16)). NFC
A future patch will turn this into MULHU/MULHS.

llvm-svn: 330519
2018-04-21 18:39:20 +00:00
Simon Pilgrim 342cf58668 [X86][X87] Add missing fldlg2 schedule test
llvm-svn: 330498
2018-04-21 10:35:04 +00:00
Hiroshi Inoue 33486787cb [PowerPC] fix incorrect vectorization of abs() on POWER9
Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it.
For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs.

int vpx_satd_c(const int16_t *coeff, int length) {
  int satd = 0;
  for (int i = 0; i < length; ++i) satd += abs(coeff[i]);
  return satd;
}

This problem causes test failures for libvpx.
For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants.
Since these instructions are for unsigned integers, we need adjustment for signed integers.
For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000).

Differential Revision: https://reviews.llvm.org/D45522

llvm-svn: 330497
2018-04-21 09:32:17 +00:00
Eli Friedman 0644130612 [AArch64] Don't crash trying to resolve __stack_chk_guard.
In certain cases, the compiler might try to merge __stack_chk_guard with
another global variable.  (Or someone could theoretically define
__stack_chk_guard as an alias.)  In that case, make sure we don't crash.

Differential Revision: https://reviews.llvm.org/D45746

llvm-svn: 330495
2018-04-21 00:07:46 +00:00
Jessica Paquette e5d279e6d6 Fix typo in test (verify-machine-instrs -> verify-machineinstrs)
llvm-svn: 330494
2018-04-20 23:37:48 +00:00
Jessica Paquette d442c3a632 [MachineOutliner] XFAIL machine-outliner-noredzone.ll
The verifier began complaining about an undefined physical register in this
test. XFAILing for the purposes of getting a bot up while I look into it.

Failure:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/11385/

llvm-svn: 330493
2018-04-20 23:35:54 +00:00
Simon Pilgrim d14d2e7b18 [X86] Add WriteFSign/WriteFLogic scheduler classes
Split the fp and integer vector logical instruction scheduler classes - older CPUs especially often handled these on different pipes.

This unearthed a couple of things that are also handled in this patch:

(1) We were tagging avx512 fp logic ops as WriteFAdd, probably because of the lack of WriteFLogic
(2) SandyBridge had integer logic ops only using Port5, when afaict they can use Ports015.
(3) Cleaned up x86 FCHS/FABS scheduling as they are typically treated as fp logic ops.

Differential Revision: https://reviews.llvm.org/D45629

llvm-svn: 330480
2018-04-20 21:16:05 +00:00
Krzysztof Parzyszek 41a24b7b13 [Hexagon] Improve HVX instruction selection (bitcast, vsplat)
There was some unfortunate interaction between VSPLAT and BITCAST
related to the selection of constant vectors (coming from selecting
shuffles). Introduce VSPLATW that always splats a 32-bit word, and
can have arbitrary result type (to avoid BITCASTs of VSPLAT).
Clean up the previous selection of BITCAST/VSPLAT.

llvm-svn: 330471
2018-04-20 19:38:37 +00:00
Krzysztof Parzyszek 642120122c [Hexagon] Skip fixed-stack indexes in HexagonConstExtenders
Fixed slots have negative values, and TRI::stackSlot2Index and
TRI::index2StackSlot do not handle negative numbers.

llvm-svn: 330468
2018-04-20 19:06:46 +00:00
Gabor Buella 31fa8025ba [X86] WaitPKG instructions
Three new instructions:

umonitor - Sets up a linear address range to be
monitored by hardware and activates the monitor.
The address range should be a writeback memory
caching type.

umwait - A hint that allows the processor to
stop instruction execution and enter an
implementation-dependent optimized state
until occurrence of a class of events.

tpause - Directs the processor to enter an
implementation-dependent optimized state
until the TSC reaches the value in EDX:EAX.

Also modifying the description of the mfence
instruction, as the rep prefix (0xF3) was allowed
before, which would conflict with umonitor during
disassembly.

Before:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
mfence

After:
$ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble
.text
umonitor        %rax

Reviewers: craig.topper, zvi

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D45253

llvm-svn: 330462
2018-04-20 18:42:47 +00:00
Jessica Paquette 2e5ada5c81 [MachineOutliner] Change B instruction for tail calls to TCRETURNdi
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.

Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.

llvm-svn: 330459
2018-04-20 18:03:21 +00:00
Sanjay Patel f04ab64b25 [x86] auto-generate checks; NFC
There's a proposal to change/add to this file in D45653,
so we should know exactly what those differences would be. 

llvm-svn: 330445
2018-04-20 16:46:58 +00:00
Sanjay Patel 3d453ad711 [DAGCombine] (float)((int) f) --> ftrunc (PR36617)
This was originally committed at rL328921 and reverted at rL329920 to
investigate failures in Chrome. This time I've added to the ReleaseNotes
to warn users of the potential of exposing UB and let me repeat that
here for more exposure:

  Optimization of floating-point casts is improved. This may cause surprising
  results for code that is relying on undefined behavior. Code sanitizers can
  be used to detect affected patterns such as this:

    int main() {
      float x = 4294967296.0f;
      x = (float)((int)x);
      printf("junk in the ftrunc: %f\n", x);
      return 0;
    }

    $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out
    ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of 
                   representable values of type 'int'
    junk in the ftrunc: 0.000000


Original commit message:

fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.

Differential Revision: https://reviews.llvm.org/D44909

llvm-svn: 330437
2018-04-20 15:07:55 +00:00
Daniel Cederman 4557178061 Revert "This pass, fixing an erratum in some LEON 2 processors..."
Summary:
Reading Atmel's AT697E errata document this does not seem like a valid
workaround. While the text only mentions SDIV, it says that the ICC flags
can be wrong, and those are only generated by SDIVcc. Verification on
hardware shows that simply replacing SDIV with SDIVcc does not avoid
the bug with negative operands.

This reverts r283727.

Reviewers: lero_chris, jyknight

Reviewed By: jyknight

Subscribers: fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D45813

llvm-svn: 330397
2018-04-20 07:53:27 +00:00
Daniel Cederman c67b3ffba7 [Sparc] Use synthetic instruction clr to zero register instead of sethi
Using `clr reg`/`mov %g0, reg`/`or %g0, %g0, reg` to zero a register
looks much better than `sethi 0, reg`.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D45810

llvm-svn: 330396
2018-04-20 07:47:12 +00:00
Nicolai Haehnle 7a87977fb2 AMDGPU: Legalize the operand of SI_INIT_M0
Summary:
This fixes a case where the argument to a sendmsg intrinsic
ends up in a VGPR, for whatever reason.

The underlying performance issue is that a multiplication that
can be an s_mul_i32 is instead needlessly generated as
v_mul_u32_u24, but this is not addressed by this patch.

Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45826

llvm-svn: 330393
2018-04-20 07:14:25 +00:00
Daniel Cederman 793af3b9f0 [Sparc] Fix addressing mode when using 64-bit values in inline assembly
Summary:
If a 64-bit register is used as an operand in inline assembly together
with a memory reference, the memory addressing will be wrong. The
addressing will be a single reg, instead of reg+reg or reg+imm. This
will generate a bad offset value or an exception in printMemOperand().

For example:

```
long long int val = 5;
long long int mem;
__asm__ volatile ("std %1, %0":"=m"(mem):"r"(val));
```
becomes:

```
std %i0, [%i2+589833]
```

The problem is that SelectInlineAsmMemoryOperand() is never called for
the memory references if one of the operands is a 64-bit register.
By calling SelectInlineAsmMemoryOperands() in tryInlineAsm() the Sparc
version of  SelectInlineAsmMemoryOperand() gets called for each memory
reference.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D45761

llvm-svn: 330392
2018-04-20 06:57:49 +00:00
Stanislav Mekhanoshin 160f85794d [AMDGPU] Use packed literals with zero either lower or hi part
Differential Revision: https://reviews.llvm.org/D45790

llvm-svn: 330365
2018-04-19 21:16:50 +00:00
Craig Topper bc895a3afc [X86] Enable popcnt false dependency breaking on Silvermont and Goldmont.
Silvermont and Goldmont have the same issue on popcnt as Sandy Bridge, Haswell, Broadwell, and Skylake. Believe it is fixed in Goldmont Plus.

llvm-svn: 330358
2018-04-19 19:25:24 +00:00
Craig Topper b5f2659130 [X86] Correct the scheduling data for register forms of XCHG and XADD on Intel CPUs.
The XCHG16rr/XCHG32rr/XCHG64rr instructions should be 3 uops just like XCHG8rr. I believe they're just implemented as 3 move uops with a temporary register.

XADD is probably 2 moves and an add also using a temporary register.

Change the latency for both from 2 cycles to 3 cycles. Only 2 of the uops are serialized in their execution, the move into the temporary and the move out of the temporary. The move from one GPR to the other should be able to go in parallel with this if there are ALU resources available.

llvm-svn: 330349
2018-04-19 18:00:17 +00:00
Krzysztof Parzyszek fbee8574ab [if-converter] Handle BBs that terminate in ret during diamond conversion
This fixes https://llvm.org/PR36825.

Original patch by Valentin Churavy (D45218).

Differential Revision: https://reviews.llvm.org/D45731

llvm-svn: 330345
2018-04-19 17:26:46 +00:00
Krzysztof Parzyszek 2a9a83cd3f [Hexagon] Use legal types when lowering CONCAT_VECTORS via BUILD_VECTOR
llvm-svn: 330344
2018-04-19 17:11:58 +00:00
Mark Searles 1bc6e71f32 [AMDGPU] Do not only rely on BB number when finding bottom loop
We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing.

Differential Revision: https://reviews.llvm.org/D43831

llvm-svn: 330337
2018-04-19 15:42:30 +00:00
Krzysztof Parzyszek d92c37e090 [Hexagon] Generate code for vector bswap intrinsics
llvm-svn: 330333
2018-04-19 14:46:44 +00:00
Krzysztof Parzyszek 23bcf06a15 [Hexagon] Add/fix patterns for 32/64-bit vector compares and logical ops
llvm-svn: 330330
2018-04-19 14:24:31 +00:00
Simon Dardis 5d61c8b225 [mips] Correct the definitions of the unaligned word memory operation instructions
These instructions lacked the correct predicates, were not marked
as loads and stores and lacked the proper instruction mapping information.

In the case of microMIPS sw(l|r)e (EVA) these instructions were using the load
EVA description.

Reviewers: abeserminji, smaksimovic, atanasyan

Differential Revision: https://reviews.llvm.org/D45626

llvm-svn: 330326
2018-04-19 13:33:51 +00:00
Alexander Ivchenko e8fed1546e Lowering x86 adds/addus/subs/subus intrinsics (llvm part)
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D44785

llvm-svn: 330322
2018-04-19 12:13:30 +00:00
Sjoerd Meijer a79ea80d7b [ARM] Add some missing FP16 VSEL test cases
Differential Revision: https://reviews.llvm.org/D45724

llvm-svn: 330313
2018-04-19 08:21:50 +00:00
Craig Topper f846e2d1b1 [X86] Scrub scheduling information for MUL/IMUL on Intel CPUs.
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.

llvm-svn: 330308
2018-04-19 05:34:05 +00:00
Artem Belevich 0ae8590354 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

llvm-svn: 330296
2018-04-18 21:51:48 +00:00
Alex Bradbury 9891ba3476 [RISCV] Add test changes missed from rL330293
llvm-svn: 330294
2018-04-18 20:36:12 +00:00
Alex Bradbury 3ff2022bb9 [RISCV] Introduce pattern for materialising immediates with 0 for lower 12 bits
These immediates can be materialised with just an lui, rather than an lui+addi 
pair.

llvm-svn: 330293
2018-04-18 20:34:23 +00:00
Alex Bradbury 792547b348 [RISCV] Add imm-cse.ll test case
This test case demonstrates that common subexpression elimination takes place 
between code sequences for materialising constants. In particular, it 
demonstrates that redundant lui aren't generated. This would capture a 
regression if applying a patch such as D41949.

llvm-svn: 330291
2018-04-18 20:25:07 +00:00
Lei Huang 829cd8e263 [NFC] test case clean up
1. remove redundant tests
2. update XForm_tests to generated expected code gen

llvm-svn: 330290
2018-04-18 20:22:26 +00:00
Alex Bradbury c0464d9271 [RISCV] Expand codegen -> compression sanity checks and move to a single file
The objdump tests interfere with update_llc_test_checks.py and can't be
automatically update them. Put the sanitify check for compression on the
codegen codepath into a separate file, and expand it to also include tests of
integer materialisation. This would catch changes such as those triggered by 
D41949.

llvm-svn: 330288
2018-04-18 20:17:29 +00:00
Alex Bradbury 099c720426 Revert "[RISCV] implement li pseudo instruction"
Reverts rL330224, while issues with the C extension and missed common
subexpression elimination opportunities are addressed. Neither of these issues
are visible in current RISC-V backend unit tests, which clearly need
expanding.

llvm-svn: 330281
2018-04-18 19:02:31 +00:00
Lei Huang 192c6ccf6d [Power9]Legalize and emit code for converting Unsigned HWord/Char to Quad-Precision
Legalize and emit code for converting unsigned HWord/Char to QP:

xscvsdqp
xscvudqp

Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.

Differential Revision: https://reviews.llvm.org/D45494

llvm-svn: 330278
2018-04-18 17:41:46 +00:00
Amara Emerson 9de072f8ae [AArch64] Add isel pattern for v8i8->v2f32 NVCASTs.
rdar://39454635

llvm-svn: 330276
2018-04-18 17:10:19 +00:00