Commit Graph

5321 Commits

Author SHA1 Message Date
Jay Foad 866d9b03f2 [AMDGPU] Tiny cleanup in isLegalFLATOffset. NFC. 2020-10-01 14:26:03 +01:00
Stanislav Mekhanoshin 722d792499 [AMDGPU] Reorganize VOP3P encoding
This changes width of encoding and opcode fields to match the
documentation.

Differential Revision: https://reviews.llvm.org/D88619
2020-09-30 15:27:06 -07:00
Mirko Brkusanin 0249df33fe [AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.

Differential Revision: https://reviews.llvm.org/D88315
2020-09-30 11:09:18 +02:00
Stanislav Mekhanoshin 61b3106965 [AMDGPU] Remove SIEncodingFamily.GFX10_B
It turns to be not needed anymore.

Differential Revision: https://reviews.llvm.org/D88520
2020-09-29 15:59:49 -07:00
Mirko Brkusanin 8b08fa0103 Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
This reverts commit f5cd7ec9f3.

Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.
2020-09-29 15:33:34 +02:00
Jay Foad 0e0a0c8d2c [AMDGPU] Reformat AMDGPUTargetLowering::isSDNodeAlwaysUniform. NFC. 2020-09-28 16:24:16 +01:00
Jay Foad d3a8e333ec [AMDGPU] Reformat SITargetLowering::isSDNodeSourceOfDivergence. NFC. 2020-09-28 14:42:05 +01:00
Jay Foad bab1a17ad7 [AMDGPU] Add bfi immediate pattern
Differential Revision: https://reviews.llvm.org/D88246
2020-09-28 10:16:51 +01:00
Jay Foad 2806f586dc [AMDGPU] Make bfi patterns divergence-aware
This tends to increase code size but more importantly it reduces vgpr
usage, and could avoid costly readfirstlanes if the result needs to be
in an sgpr.

Differential Revision: https://reviews.llvm.org/D88245
2020-09-28 10:16:51 +01:00
Jay Foad 286d3fc750 [AMDGPU] Split R600 and GCN bfi patterns
This is in preparation for making the GCN patterns divergence-aware.
NFC.

Differential Revision: https://reviews.llvm.org/D88244
2020-09-28 10:16:51 +01:00
Fangrui Song 20e9c36c01 Internalize functions from various tools. NFC
And internalize some classes if I noticed them:)
2020-09-26 15:57:13 -07:00
Jay Foad f11f382523 [AMDGPU] Fix declaration parameter names to match definition
This fixes the declaration of AMDGPULegalizerInfo::legalizeBufferLoad to
match the definition. It is still confusing that that parameter order is
different from legalizeBufferStore.

https://bugs.llvm.org/show_bug.cgi?id=47535
2020-09-25 11:38:17 +01:00
Stanislav Mekhanoshin 27a62f6317 [AMDGPU] global-isel support for RT
Differential Revision: https://reviews.llvm.org/D87847
2020-09-24 10:29:45 -07:00
Jay Foad c05cf1ca3c [AMDGPU] Use cast instead of dyn_cast 2020-09-24 15:20:49 +01:00
Sebastian Neubauer 6f7cd16d29 [AMDGPU] Fix v3f16 handling for getresinfo
v3f32 should not be expanded to v4f32. getresinfo with a dmask of 7
created an image sample with a v3f32 return value, which was bitcasted
to a v4f32 in constructRetValue.

Differential Revision: https://reviews.llvm.org/D88206
2020-09-24 16:03:02 +02:00
Pushpinder Singh 41d6669f1f [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D85653
2020-09-23 22:25:29 -04:00
Carl Ritson 1e0500d4f7 [AMDGPU] Consider all SGPR uses as unique in constant bus verify
Fix the verifier so that overlapping SGPR operands are counted
independently.  We cannot assume that overlapping SGPR accesses
only count as a single constant bus use.
The exception is implicit uses which do not add to constant bus
usage (only) when overlapping.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87748
2020-09-24 10:52:40 +09:00
Sebastian Neubauer a343b9b032 Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Matt Arsenault af0207f2ba AMDGPU: Check global FP atomics match default FP mode
We would always select global FP atomics from atomicrmw fadd, although
they have a hardcoded FP mode.
2020-09-23 09:07:50 -04:00
Sebastian Neubauer ca907bfb57 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
Piotr Sobczak 8d7fd73c3a [AMDGPU] Fix merging m0 inits
Fix incorrect merges of m0 inits in loops.

It was assumed that if a clobbering instruction appears in
the same block as an init and the clobbering instruction
does not dominate the init then it does not interfere with
init.

This does not work in the presence of loops, where in this
scenario, the clobbering instruction does interfere with
the init in another iteration.

To fix this, do not check for block equality and defer the
decision to the predecessor check.

Differential Revision: https://reviews.llvm.org/D87882
2020-09-23 09:13:43 +02:00
Fangrui Song 49f2744931 Change LoopInfo::empty to isInnermost after D82895 2020-09-22 14:07:40 -07:00
Jay Foad 892ef2e3c0 [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D88028
2020-09-22 10:41:38 +01:00
Matt Arsenault 6daddc213f AMDGPU: Don't add frame register to frame pseudos
We no longer treat the frame register like a function argument, so the
problem this avoided is no longer relevant.
2020-09-21 16:18:47 -04:00
Alexander Belyaev 17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de6.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Matt Arsenault 3105d0f84b CodeGen: Move split block utility to MachineBasicBlock
AMDGPU needs this in several places, so consolidate them here.
2020-09-18 14:05:18 -04:00
Matt Arsenault 0576f436e5 AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.
2020-09-18 13:43:01 -04:00
Francis Visoiu Mistrih 0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
Matt Arsenault 27df165270 Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."
This reverts commit c3492a1aa1.

I think this is the wrong strategy and wrong place to do this
transform anyway. Also reverts follow up commit
7d593d0d69.
2020-09-18 09:48:33 -04:00
Mirko Brkusanin ae36c02ad0 [AMDGPU] Set DS alignment requirements to be more strict
Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are
now the same as for other GCN subtargets. This way we can avoid any
unintentional use of these instructions on systems that do not support dword
alignment and instead require natural alignment.
This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default.

Differential Revision: https://reviews.llvm.org/D87821
2020-09-18 15:26:24 +02:00
Bogdan Graur 7d593d0d69 [amdgpu] Compilation fix for Release
Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D87838
2020-09-17 18:04:53 +02:00
Michael Liao c3492a1aa1 [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.
- Need to lower COPY from SGPR to VGPR to a real instruction as the
  standard COPY is used where the source and destination are from the
  same register bank so that we potentially coalesc them together and
  save one COPY. Considering that, backend optimizations, such as CSE,
  won't handle them. However, the copy from SGPR to VGPR always needs
  materializing to a native instruction, it should be lowered into a
  real one before other backend optimizations.

Differential Revision: https://reviews.llvm.org/D87556
2020-09-17 11:04:17 -04:00
alex-t 0efbb70b71 [AMDGPU] should expand ROTL i16 to shifts.
Instruction combining pass turns library rotl implementation to llvm.fshl.i16.
In the selection dag the intrinsic is turned to ISD::ROTL node that cannot be selected.
Need to expand it to shifts again.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D87618
2020-09-17 17:34:33 +03:00
Simon Pilgrim f026812110 InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.
Move the include down to cpp files with an implicit dependency.
2020-09-17 14:28:42 +01:00
Simon Pilgrim 8adf92e2d1 [AMDGPU] Remove orphan SITargetLowering::LowerINT_TO_FP declaration. NFCI.
Method implementation no longer exists.
2020-09-17 10:45:53 +01:00
Stanislav Mekhanoshin 91f503c3af [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Matt Arsenault 367248956e AMDGPU: Clear offset register when using local stack area
eliminateFrameIndex won't fix up the offset register when the direct
frame index reference is moved to a separate move instruction. Switch
the offset to a base 0 (which it probably should be to begin with).
2020-09-16 12:56:40 -04:00
Jay Foad cb64455faa [AMDGPU] Remove obsolete comment
Obsoleted by e4464bf3d4 "AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR"
2020-09-16 17:03:55 +01:00
Dmitry Preobrazhensky 06d058afec [AMDGPU] Corrected directive to use for ELF weak refs
WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol".
The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes.
The correct directive is ".weak" and it is already defined as default for ELF.
So the redefinition was removed.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87762
2020-09-16 18:51:26 +03:00
Mircea Trofin 6e85c3d5c7 [NFC][Regalloc] accessors for 'reg' and 'weight'
Also renamed the fields to follow style guidelines.

Accessors help with readability - weight mutation, in particular,
is easier to follow this way.

Differential Revision: https://reviews.llvm.org/D87725
2020-09-16 08:28:57 -07:00
Matt Arsenault 71131db689 AMDGPU: Improve <2 x i24> arguments and return value handling
This was asserting for GlobalISel. For SelectionDAG, this was
passing this on the stack. Instead, scalarize this as if it were a
32-bit vector.
2020-09-16 11:21:56 -04:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Jay Foad 90777e2924 [AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446
2020-09-16 16:10:47 +01:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Simon Pilgrim 97a23ab28a AMDGPUPrintfRuntimeBinding.cpp - drop unnecessary casts/dyn_casts. NFCI.
GetElementPtrInst::Create returns a GetElementPtrInst* so we don't need to cast. Similarly IntegerType inherits from the Type base class.

Also, I've used auto* in a few places to cleanup the code.

Helps fix some clang-tidy warnings which saw the dyn_casts and warned that these can return null.
2020-09-15 14:49:04 +01:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Austin Kerbow f859c30ecb [AMDGPU] Add XDL resource to scheduling model
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87621
2020-09-14 13:48:54 -07:00
Eric Astor 20201dc76a [ms] [llvm-ml] Add support for size queries in MASM
Add support for size inference, sizeof, typeof, and lengthof.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86947
2020-09-14 14:27:06 -04:00
Jay Foad c799f873cb [AMDGPU] Don't cluster stores
Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.

The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.

Differential Revision: https://reviews.llvm.org/D85530
2020-09-14 13:40:17 +01:00
Petar Avramovic 6e2a86ed5a AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN
Check for NoNaNsFPMath function attribute in isKnownNeverSNaN.
Function attributes are in held in 'TargetMachine.Options'.
Among other things, this allows selection of some patterns imported
in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN
returns true in lowerFMinNumMaxNum.

However we notice some incorrect results since function attributes are
not correctly written in TargetMachine.Options when next function is
processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0,
it has "no-nans-fp-math"="false" but TargetMachine.Options still has it
set to true since first function in test file had this attribute set to
true. This will be fixed in D87511.

Differential Revision: https://reviews.llvm.org/D87456
2020-09-14 12:11:00 +02:00