Commit Graph

8 Commits

Author SHA1 Message Date
Carl Ritson 7a880ab388 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Sebastian Neubauer a343b9b032 Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Sebastian Neubauer ca907bfb57 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
Stanislav Mekhanoshin 9a9a092e61 [AMDGPU] Avoid sorting stalls in regbank-reassign
This is the slowest operation in the already slow pass.
Instead of sorting just put a stall list into an ordered
map.

Differential Revision: https://reviews.llvm.org/D86253
2020-08-21 11:49:41 -07:00
Sebastian Neubauer 8756869170 [AMDGPU] Add a16 feature to gfx10
Based on D72931

This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.

Differential Revision: https://reviews.llvm.org/D73956
2020-02-10 09:04:23 +01:00
Sebastian Neubauer 163e33b290 [AMDGPU] Fix lowering a16 image intrinsics
scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers

Differential Revision: https://reviews.llvm.org/D73482
2020-02-05 10:54:34 +01:00
Sebastian Neubauer 3bc7ffdaab [AMDGPU] Use v3f32 type in image instructions
This should lower the amount of used registers for gfx9.

I updated some of the changed tests with the update script because
changing them by hand is tedious.

Differential Revision: https://reviews.llvm.org/D73884
2020-02-05 10:35:41 +01:00
Ryan Taylor 1f334d0062 [AMDGPU] Add support for a16 modifiear for gfx9
Summary:
Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9.

Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50575

llvm-svn: 340831
2018-08-28 15:07:30 +00:00