Commit Graph

47731 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 739174c4be [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

llvm-svn: 333691
2018-05-31 20:13:51 +00:00
Sriraman Tallam d10c4e07f5 Relax GOTPCREL relocations for tail jmp instructions.
Differential Revision: https://reviews.llvm.org/D47563

llvm-svn: 333676
2018-05-31 18:12:33 +00:00
Francis Visoiu Mistrih 90aba024c5 [MC] Fallback on DWARF when generating compact unwind on AArch64
Instead of asserting when using the def_cfa directive with a register
different from fp, fallback on DWARF.

Easily triggered with:

.cfi_def_cfa x1, 32;

rdar://40249694

Differential Revision: https://reviews.llvm.org/D47593

llvm-svn: 333667
2018-05-31 16:33:26 +00:00
Roman Tereshin f34d7ecc15 [GlobalISel][Mips] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for Mips
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333665
2018-05-31 16:16:49 +00:00
Roman Tereshin 76c29c68dc [GlobalISel][AMDGPU] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for AMDGPU
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333664
2018-05-31 16:16:48 +00:00
Roman Tereshin 667c7581ed [GlobalISel][ARM] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call and fixing bugs exposed
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333663
2018-05-31 16:16:48 +00:00
Roman Tereshin cc1a16fdf9 [GlobalISel][X86] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call and fixing bugs exposed
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333662
2018-05-31 16:16:47 +00:00
Simon Pilgrim ff0623cd29 [X86][SSE] Recognise splat rotations and expand back to shift ops.
Noticed while fixing PR37426, for splat rotations (rotation by an uniform value) its better to just expand back to shift ops than performing as a general non-uniform rotation.

llvm-svn: 333661
2018-05-31 15:47:17 +00:00
Simon Pilgrim c34395d889 [X86][AVX] Add peekThroughEXTRACT_SUBVECTORs helper (NFCI)
We often need this for AVX1 128-bit integer ops as they may have been split from a 256-bit source.

llvm-svn: 333660
2018-05-31 15:15:49 +00:00
Clement Courbet 2e41c5a79c [X86] Introduce WriteFLDC for x87 constant loads.
Summary:
{FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded.

 - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
 - For ZnVer1 and Atom, values were transferred form InstRWs.
 - For SLM and BtVer2, I've guessed some values :(

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47585

llvm-svn: 333656
2018-05-31 14:22:01 +00:00
Simon Dardis d9a453832d [mips] Guard all short instructions correctly.
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D47533

llvm-svn: 333645
2018-05-31 12:47:01 +00:00
Clement Courbet b78ab5097d [X86] Extract latency of fldz/fld1 in separate classes.
Summary:
 - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
 - For ZnVer1 and Atom, values were transferred form `InstRW`s.
 - For SLM and BtVer2, values are from Agner.

This is split off from https://reviews.llvm.org/D47377

Reviewers: RKSimon, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47523

llvm-svn: 333642
2018-05-31 11:41:27 +00:00
Simon Pilgrim 346886bc0d [X86][SSE] Add support for detecting SUB(SPLAT_BV, SPLAT) cases for shift-rotate patterns.
This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426).

llvm-svn: 333641
2018-05-31 11:25:16 +00:00
Luke Geeson 2e09995d42 [AArch64] Reverted rL333427 fixing Clang UnitTest Failure
llvm-svn: 333634
2018-05-31 08:27:53 +00:00
Stanislav Mekhanoshin d4b500cb08 [AMDGPU] Track occupancy in MFI
Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:

- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.

MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.

New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.

Differential Revision: https://reviews.llvm.org/D47509

llvm-svn: 333629
2018-05-31 05:36:04 +00:00
Jan Vesely f5016b79a6 AMDGPU/R600: Make sure functions are cacheline aligned
v2: use "ensureAlignment"
    make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"

Differential Revision: https://reviews.llvm.org/D47516

llvm-svn: 333622
2018-05-31 04:08:08 +00:00
Roman Tereshin 5a65eb75c7 [GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...)
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333618
2018-05-31 01:56:05 +00:00
Tom Stellard c7624317d7 AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47359

llvm-svn: 333605
2018-05-30 22:55:35 +00:00
Roman Tereshin 8f1753e994 [GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output

Reviewers: aemerson, qcolombet

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46338

llvm-svn: 333597
2018-05-30 22:10:04 +00:00
Simon Pilgrim 5e9f459c62 [X86][SSE] Pulled out splat detection helper from LowerScalarVariableShift (NFCI)
Created the IsSplatValue helper from the splat detection code in LowerScalarVariableShift as a first NFC step towards improving support for splat rotations, which is an extension of PR37426.

llvm-svn: 333580
2018-05-30 19:16:59 +00:00
Mark Searles ed54ff1d51 [AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst'
https://reviews.llvm.org/rL333556 caused a buildbot failure.

See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio

/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(),

The unused variable was for debugging purposes; removing that piece of code
to fix the build.

llvm-svn: 333559
2018-05-30 16:27:57 +00:00
Matt Arsenault 7b4826e6ce AMDGPU: Use better alignment for kernarg lowering
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.

llvm-svn: 333558
2018-05-30 16:17:51 +00:00
Mark Searles 1054541490 [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.

Differential Revision: https://reviews.llvm.org/D47488

llvm-svn: 333556
2018-05-30 15:47:45 +00:00
Gabor Buella 890e363e11 [X86] Lowering FMA intrinsics to native IR (LLVM part)
Support for Clang lowering of fused intrinsics. This patch:

1. Removes bindings to clang fma intrinsics.
2. Introduces new LLVM unmasked intrinsics with rounding mode:
     int_x86_avx512_vfmadd_pd_512
     int_x86_avx512_vfmadd_ps_512
     int_x86_avx512_vfmaddsub_pd_512
     int_x86_avx512_vfmaddsub_ps_512
     supported with a new intrinsic type (INTR_TYPE_3OP_RM).
3. Introduces new x86 fmaddsub/fmsubadd folding.
4. Introduces new tests for code emitted by sequentions introduced in Clang part.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D47443

llvm-svn: 333554
2018-05-30 15:25:16 +00:00
Amaury Sechet f47d9f30b0 [ARM] Remove code handling ADDC/ADDE/SUBC/SUBE
Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY .

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D47413

llvm-svn: 333544
2018-05-30 13:45:43 +00:00
Krzysztof Parzyszek 8987174627 [Hexagon] Use vector align-left when shift amount fits in 3 bits
This saves an instruction because for align-right the shift amount
would need to be put in a register first.

llvm-svn: 333543
2018-05-30 13:45:34 +00:00
Simon Dardis f990bf1cd2 [mips] Correct the definition of CTC2/CFC2
llvm-svn: 333542
2018-05-30 13:21:13 +00:00
Simon Dardis a3aa926c09 [mips] Correct the predicates of microMIPS compact branch instructions
llvm-svn: 333541
2018-05-30 13:16:17 +00:00
Simon Dardis f909058ad4 [mips] Sink PredicateControl further down the class hierarchy.
Previously PredicateControl in some cases was a member of <X>Inst classes
for some X (DSP, EVA) or was in more irregular place in the hierarchry
for any given instruction.

This patch moves PredicateControl down to the root so that it is consistently
available. Then correct the base class of microMIPS instructions as using
EncodingPredicates instead of the general Predicates field of Instruction.

Reviewers: smaksimovic, abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D47526

llvm-svn: 333536
2018-05-30 12:40:53 +00:00
Simon Dardis 39710e3555 [mips] Correct the predicates of arithmetic and logic instructions.
As part of this effort, duplicate and correct the predicates of some
aliases. Also disable code generation of some short form instructions
for FastISel, as it would otherwise reject them.

Reviewers: atanasyan, abeserminji, smaksimovic

Differential Revision: https://reviews.llvm.org/D47075

llvm-svn: 333530
2018-05-30 11:33:35 +00:00
Tim Northover d8949f5002 AArch64: print correct annotation for ADRP addresses.
The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the
actual PC-offset that will be calculated.

llvm-svn: 333525
2018-05-30 09:54:59 +00:00
Sander de Smalen bdf09fe7a2 [AArch64][AsmParser] Fix segfault on illegal fpimm.
Floating point immediate combining a negative sign and
a hexadecimal number, e.g. #-0x0  caused the compiler to crash.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47483

llvm-svn: 333524
2018-05-30 09:54:19 +00:00
Daniel Cederman 248dae81dc [Sparc] Treat %fxx registers with value type Other as single precision
They get type Other when used in the clobber list in inline assembly.
This fixes tests fp128.ll and float.ll that failed after r333512.

llvm-svn: 333523
2018-05-30 09:52:18 +00:00
Daniel Cederman 60e6ce4155 [Sparc] Select correct register class for FP register constraints
Summary: The fX version of floating-point registers only supports
single precision. We need to map the name to dX for doubles and qX
for long doubles if we want getRegForInlineAsmConstraint() to be
able to pick the correct register class.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits

Differential Revision: https://reviews.llvm.org/D47258

llvm-svn: 333512
2018-05-30 06:07:55 +00:00
Craig Topper cc0741e59f [X86] Add unmasked AVX512VNNI instrinsics. Use a select in IR instead.
A future patch will remove the old masked intrinsics.

llvm-svn: 333508
2018-05-30 05:25:59 +00:00
Shiva Chen c3d0e89284 [RISCV] Support resolving fixup_riscv_call and add to MCFixupKindInfo table
Resolving fixup_riscv_call by assembler when the linker relaxation diabled
and the function and callsite within the same compile unit.

And also adding static_assert after Infos array declaration
to avoid missing any new fixup in MCFixupKindInfo in the future.

Differential Revision: https://reviews.llvm.org/D47126

llvm-svn: 333487
2018-05-30 01:16:36 +00:00
Craig Topper 5989db0fb4 [X86] Remove some of the extractelts from the new MOVSS+FMA patterns.
We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced.

So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV.

llvm-svn: 333473
2018-05-29 22:52:09 +00:00
Craig Topper dbd371e931 [X86] Use VR128X instead of VR128 in EVEX instruction patterns.
llvm-svn: 333464
2018-05-29 20:46:27 +00:00
Craig Topper aba57bfebd [X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.
The order should be controlled in the input pattern.

llvm-svn: 333463
2018-05-29 20:46:26 +00:00
Craig Topper 5439b3d1e5 [X86] Fix a potential crash that occur after r333419.
The code could issue a truncate from a small type to larger type. We need to extend in that case instead.

llvm-svn: 333460
2018-05-29 20:04:10 +00:00
Matt Arsenault 2e4d338d16 AMDGPU: Fix typo in option description
llvm-svn: 333457
2018-05-29 19:35:46 +00:00
Matt Arsenault 1ea0402e82 AMDGPU: Round up kernel argument allocation size
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

llvm-svn: 333456
2018-05-29 19:35:00 +00:00
Sameer AbuAsal 97684419e8 [RISCV] Add peepholes for Global Address lowering patterns
Summary:
  Base and offset are always separated when a GlobalAddress node is lowered
  (rL332641) as an optimization to reduce instruction count. However, this
  optimization is not profitable if the Global Address ends up being used in only
  instruction.

  This patch adds peephole optimizations that merge an offset of
  an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.

  The peephole handles three patterns:

 1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset
     --->
      ADDI (LUI %hi(global + offset)) %lo(global + offset).

   This generates:
   lui a0, hi (global + offset)
   add a0, a0, lo (global + offset)

   Instead of

   lui a0, hi (global)
   addi a0, hi (global)
   addi a0, offset

   This pattern is for cases when the offset is small enough to fit in the
   immediate filed of ADDI (less than 12 bits).

 2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset))
     --->
      offset = hi_offset << 12
      ADDI (LUI %hi(global + offset)) %lo(global + offset)

   Which generates the ASM:

   lui  a0, hi(global + offset)
   addi a0, lo(global + offset)

   Instead of:

   lui  a0, hi(global)
   addi a0, lo(global)
   lui a1, (offset)
   add a0, a0, a1

   This pattern is for cases when the offset doesn't fit in an immediate field
   of ADDI but the lower 12 bits are all zeros.

 3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset)))
     --->
        offset = global + offhi20<<12 + offlo12
        ADDI (LUI %hi(global + offset)) %lo(global + offset)

   Which generates the ASM:

   lui  a1, %hi(global + offset)
   addi a1, %lo(global + offset)

   Instead of:

   lui  a0, hi(global)
   addi a0, lo(global)
   lui a1, (offhi20)
   addi a1, (offlo12)
   add a0, a0, a1

   This pattern is for cases when the offset doesn't fit in an immediate field
   of ADDI and both the lower 1 bits and high 20 bits are non zero.

    Reviewers: asb

    Reviewed By: asb

    Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos,
  niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang

llvm-svn: 333455
2018-05-29 19:34:54 +00:00
Konstantin Zhuravlyov 2ca6b1f2ba AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as
it is set by CP

Differential Revision: https://reviews.llvm.org/D47392

llvm-svn: 333451
2018-05-29 19:09:13 +00:00
Eli Friedman 63fead0f43 [ARM] Enable SETCCCARRY lowering for Thumb1.
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387

llvm-svn: 333445
2018-05-29 18:17:16 +00:00
Matt Arsenault ceafc55e5a AMDGPU: Pass function directly instead of MachineFunction
These functions just query the underlying IR function,
so pass it directly.

llvm-svn: 333442
2018-05-29 17:42:50 +00:00
Matt Arsenault 2fb9ccf770 AMDGPU: Add nuw to add off of kernarg ptr
llvm-svn: 333441
2018-05-29 17:42:38 +00:00
Matt Arsenault ab2b79cb97 DAG: Remove redundant version of getRegisterTypeForCallingConv
There seems to be no real reason to have these separate copies.
The existing implementations just copy each other for x86.
For Mips there is a subtle difference, which is just a bug
since it changes based on the context where which one was called.
Dropping this version, all tests pass. If I try to merge them
to match the removed version, a test fails.

llvm-svn: 333440
2018-05-29 17:42:26 +00:00
Tom Stellard 57b9342c80 AMDGPU: Split R600 MCInst lowering into its own class
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47307

llvm-svn: 333439
2018-05-29 17:41:59 +00:00
Evandro Menezes f8425340e4 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00