Commit Graph

2134 Commits

Author SHA1 Message Date
Lang Hames 60fbc7cc38 [MC] Thread unique_ptr<MCObjectWriter> through the create.*ObjectWriter
functions.

This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.

llvm-svn: 315327
2017-10-10 16:28:07 +00:00
Nicolai Haehnle 312b64f4d7 AMDGPU: Split MUBUF offset into aligned components
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.

Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.

Fixes dEQP-GLES31.functional.ssbo.atomic.*

Reviewers: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37850

llvm-svn: 315302
2017-10-10 12:22:23 +00:00
NAKAMURA Takumi aba2b3d1f3 SILoadStoreOptimizer.cpp: Fix build; Clang doesn't like "using anonymous struct" since rL315256.
llvm-svn: 315283
2017-10-10 08:30:53 +00:00
Lang Hames dcb312bdb9 [MC] Plumb unique_ptr<MCELFObjectTargetWriter> through createELFObjectWriter to
ELFObjectWriter's constructor.

Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.

llvm-svn: 315254
2017-10-09 23:53:15 +00:00
Stanislav Mekhanoshin de42c29a68 [AMDGPU] New 64 bit div/rem expansion
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.

Passes OpenCL conformance test_integer_ops quick_[u]long_math

Differential Revision: https://reviews.llvm.org/D38607

llvm-svn: 315081
2017-10-06 17:24:45 +00:00
Matt Arsenault 2d3f8f333d AMDGPU: Set v2i32 any_extend to expand
llvm-svn: 314993
2017-10-05 17:38:30 +00:00
Konstantin Zhuravlyov aa0835a7ab AMDGPU: Add and set AMDGPU-specific e_flags
Differential Revision: https://reviews.llvm.org/D38556

llvm-svn: 314987
2017-10-05 16:19:18 +00:00
Matt Arsenault f48e5c9ce5 AMDGPU: Add comment about clamps
llvm-svn: 314952
2017-10-05 00:13:20 +00:00
Matt Arsenault aafff87dda AMDGPU: Do not fold clamp instructions when sources are different
Patch by hakzsam (Samuel Pitoiset)

llvm-svn: 314951
2017-10-05 00:13:17 +00:00
Matt Arsenault 9ab1fa6803 AMDGPU: Fix not accounting for instruction size in bundles
These were counted as 0. Fixes branch limit exceeded errors
in some large programs.

llvm-svn: 314944
2017-10-04 22:59:12 +00:00
Konstantin Zhuravlyov 8684f7b4f9 AMDGPU: Correctly set EI_OSABI based on the os
Differential Revision: https://reviews.llvm.org/D38555

llvm-svn: 314943
2017-10-04 22:44:13 +00:00
Sanjay Patel 4c33d5213b [SimplifyCFG] put the optional assumption cache pointer in the options struct; NFCI
This is a follow-up to https://reviews.llvm.org/D38138. 

I fixed the capitalization of some functions because we're changing those
lines anyway and that helped verify that we weren't accidentally dropping 
any options by using default param values.

llvm-svn: 314930
2017-10-04 20:26:25 +00:00
Konstantin Zhuravlyov 22bc039c89 AMDGPU: Expand setcc for v2f32 and v4f32
llvm-svn: 314853
2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov 908fa90b51 AMDGPU: Expand setcc for v2i32 and v4i32
llvm-svn: 314852
2017-10-03 21:31:24 +00:00
Tim Renouf 72800f0436 [AMDGPU] implemented pal metadata
Summary:
For the amdpal OS type:

We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.

This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.

Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D37753

llvm-svn: 314829
2017-10-03 19:03:52 +00:00
Alexander Timofeev 4651396584 [AMDGPU] Avoid predicated execution of the basic blocks containing scalar
instructions.

Differential revision: https://reviews.llvm.org/D38293

llvm-svn: 314828
2017-10-03 18:55:36 +00:00
Matt Arsenault 90c7593a75 AMDGPU: Remove global isGCN predicates
These are problematic because they apply to everything,
and can easily clobber whatever more specific predicate
you are trying to add to a function.

Currently instructions use SubtargetPredicate/PredicateControl
to apply this to patterns applied to an instruction definition,
but not to free standing Pats. Add a wrapper around Pat
so the special PredicateControls requirements can be appended
to the final predicate list like how Mips does it.

llvm-svn: 314742
2017-10-03 00:06:41 +00:00
Matt Arsenault c6baa85fc6 AMDGPU: Fix typos
llvm-svn: 314715
2017-10-02 20:31:18 +00:00
Stanislav Mekhanoshin 1d8cf2be89 [AMDGPU] Set fast-math flags on functions given the options
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.

This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.

Differential Revision: https://reviews.llvm.org/D38325

llvm-svn: 314568
2017-09-29 23:40:19 +00:00
Nicolai Haehnle ce4ddd06da AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC
The hardware will only forward EXEC_LO; the high 32 bits will be zero.

Additionally, inline constants do not work. At least,

   v_addc_u32_e64 v0, vcc, v0, v1, -1

which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.

The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine

   s_mov_b64 s[0:1], exec
   v_cndmask_b32_e64 v0, v1, v2, s[0:1]

into

   v_mov_b32 v0, v3

but it's not particularly high priority.

Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*

llvm-svn: 314522
2017-09-29 15:37:31 +00:00
Jonas Paulsson c9e363ac69 [SystemZ] implement shouldCoalesce()
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.

If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).

This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899
https://bugs.llvm.org/show_bug.cgi?id=34610

llvm-svn: 314516
2017-09-29 14:31:39 +00:00
Tim Renouf ef1ae8ffac [AMDGPU] calling conventions for AMDPAL OS type
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37752

llvm-svn: 314502
2017-09-29 09:51:22 +00:00
Tim Renouf 132291589f [AMDGPU] AMDPAL scratch buffer support
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.

With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.

Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.

The documentation for the AMDPAL ABI will be added in a later commit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye

Differential Revision: https://reviews.llvm.org/D37483

llvm-svn: 314501
2017-09-29 09:49:35 +00:00
Tim Renouf 9f7ead3334 [Triple] Add AMDPAL operating system type
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.

Currently it generates the same code as not specifying an OS at all.
That will change in future commits.

Patch from Tim Corringham.

Subscribers: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D37380

llvm-svn: 314500
2017-09-29 09:48:12 +00:00
Sanjay Patel 0f9b4773c1 [SimplifyCFG] add a struct to house optional folds (PR34603)
This was intended to be no-functional-change, but it's not - there's a test diff.

So I thought I should stop here and post it as-is to see if this looks like what was expected 
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Notes:
 1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried 
    through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'. 
    The parameter isn't passed down, so we pick up the default value from the function signature 
    after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the 
    'SimplifyCFG' calls.

 2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops. 
    This would theoretically allow us to differentiate the transforms controlled by those params 
    independently.

 3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too. 
    I just stopped here to minimize the diffs.

 4. Similarly, I stopped short of messing with the pass manager layer. I have another question that 
    could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG 
    set to true no matter where in the pipeline it's creating SimplifyCFG passes?

    // Create an early function pass manager to cleanup the output of the
    // frontend.
    EarlyFPM.addPass(SimplifyCFGPass());

    -->

    /// \brief Construct a pass with the default thresholds
    /// and switch optimizations.
    SimplifyCFGPass::SimplifyCFGPass()
       : BonusInstThreshold(UserBonusInstThreshold),
         LateSimplifyCFG(true) {}   <-- switches get converted to lookup tables and loops may not be in canonical form

    If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG' 
    setting via recursion was masking this bug.

Differential Revision: https://reviews.llvm.org/D38138

llvm-svn: 314308
2017-09-27 14:54:16 +00:00
Matt Arsenault 1390af2dd2 AMDGPU: Add option to stress calls
This inverts the behavior of the AlwaysInline pass to mark
every function not already marked alwaysinline as noinline.

llvm-svn: 313865
2017-09-21 07:00:48 +00:00
Matt Arsenault fdcdd88d57 AMDGPU: Fix crash on immediate operand
We can have a v_mac with an immediate src0.
We can still fold if it's an inline immediate,
otherwise it already uses the constant bus.

llvm-svn: 313852
2017-09-21 00:45:59 +00:00
Matt Arsenault 8cbb4884a5 AMDGPU: Start selecting v_mad_mixhi_f16
llvm-svn: 313814
2017-09-20 21:01:24 +00:00
Matt Arsenault e135c4c6a6 AMDGPU: Add tied operands to v_mad_mix{lo|hi}_f16
These write to the low and high half of the destination
register and leave the other 16-bits unchanged. This is true
for most 16-bit instructions on gfx9, but we don't use that
now.

llvm-svn: 313812
2017-09-20 20:53:49 +00:00
Matt Arsenault 76935122cc AMDGPU: Start selecting v_mad_mixlo_f16
Also add some tests that should be able to use v_mad_mixhi_f16,
but do not yet. This is trickier because we don't really model
the partial update of the register done by 16-bit instructions.

llvm-svn: 313806
2017-09-20 20:28:39 +00:00
Matt Arsenault 644883ff07 AMDGPU: Fix encoding of op_sel for mad_mix* opcodes
llvm-svn: 313797
2017-09-20 19:09:28 +00:00
Stanislav Mekhanoshin 2e3bf37ec4 [AMDGPU] Fixed memory leak with inliner replaced
Delete inliner before replacing it.

llvm-svn: 313723
2017-09-20 06:34:28 +00:00
Matt Arsenault c8aea66627 AMDGPU: Move r600 only code into r600 only td file
llvm-svn: 313719
2017-09-20 06:11:25 +00:00
Stanislav Mekhanoshin 5641820141 [AMDGPU] Fix regression in test clang/test/CodeGen/backend-unsupported-error.ll
llvm-svn: 313718
2017-09-20 06:10:15 +00:00
Matt Arsenault b81495dccb AMDGPU: Match load d16 hi instructions
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.

We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.

llvm-svn: 313716
2017-09-20 05:01:53 +00:00
Stanislav Mekhanoshin 5670e6d482 [AMDGPU] Port of HSAIL inliner
Differential Revision: https://reviews.llvm.org/D36849

llvm-svn: 313714
2017-09-20 04:25:58 +00:00
Matt Arsenault bc68383166 AMDGPU: Cleanup load/store PatFrags
Try to use a consistent naming scheme.

llvm-svn: 313713
2017-09-20 03:43:35 +00:00
Matt Arsenault fcc213fab7 AMDGPU: Match store d16_hi instructions
llvm-svn: 313712
2017-09-20 03:20:09 +00:00
Stanislav Mekhanoshin d4ae470d2e [AMDGPU] Prevent post-RA scheduler from breaking memory clauses
The pre-RA scheduler does load/store clustering, but post-RA
scheduler undoes it. Add mutation to prevent it.

Differential Revision: https://reviews.llvm.org/D38014

llvm-svn: 313670
2017-09-19 20:54:38 +00:00
Matt Arsenault e745d9963e AMDGPU: Run internalize symbols at -O0
The relocations used for externally visible functions
aren't supported, so the direct call emitted ends
up hitting a linker error.

llvm-svn: 313616
2017-09-19 07:40:11 +00:00
Konstantin Zhuravlyov ca8946a376 AMDGPU: Start selecting s_xnor_{b32, b64}
Differential Revision: https://reviews.llvm.org/D37981

llvm-svn: 313565
2017-09-18 21:22:45 +00:00
Jan Sjodin 1f2f57a7ea Fix warnings in r313297.
llvm-svn: 313302
2017-09-14 21:49:52 +00:00
Matt Arsenault c317287fde AMDGPU: Fix violating constant bus restriction
You can't use madmk/madmk if it already uses an SGPR input.

llvm-svn: 313298
2017-09-14 20:54:29 +00:00
Jan Sjodin 312ccf761c Add AddresSpace to PseudoSourceValue.
Differential Revision: https://reviews.llvm.org/D35089

llvm-svn: 313297
2017-09-14 20:53:51 +00:00
Matt Arsenault 37ab4cf8b8 AMDGPU: Fix assert on alloca of array of struct
llvm-svn: 313282
2017-09-14 18:02:29 +00:00
Matt Arsenault defe371771 AMDGPU: Stop modifying SP in call sequences
Because the stack growth direction and addressing is done
in the same direction, modifying SP at the beginning of the
call sequence was incorrect. If we had a stack passed argument,
we would end up skipping that number of bytes before pushing
arguments, leaving unused/inconsistent space.

The callee creates fixed stack objects in its frame, so
the space necessary for these is already logically allocated
in the callee, so we just let the callee increment SP if
it really requires it.

llvm-svn: 313279
2017-09-14 17:37:40 +00:00
Matt Arsenault 6efd082c01 AMDGPU: Make frame register caller preserved
Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted after the true prolog where the FP
was clobbered.

I have a hacky solution which works that continues to use
split CSR, but for now this is simpler and will get to working
programs.

llvm-svn: 313274
2017-09-14 17:14:57 +00:00
Matt Arsenault ecb43ef1bc AMDGPU: Don't spill SP reg like a normal CSR
llvm-svn: 313217
2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin 7fe9a5d9b4 Allow target to decide when to cluster loads/stores in misched
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.

Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.

Differential Revision: https://reviews.llvm.org/D37698

llvm-svn: 313208
2017-09-13 22:20:47 +00:00
Matt Arsenault fb017ae155 AMDGPU: Handle coldcc in more places
Missed in r312936

llvm-svn: 313205
2017-09-13 21:55:52 +00:00