Commit Graph

146 Commits

Author SHA1 Message Date
Matt Arsenault d768737454 AMDGPU: Fix not encoding src2 of VOP3b instructions
Broken by r247074. Should include an assembler test,
but the assembler is currently broken for VOP3b apparently.

llvm-svn: 247123
2015-09-09 08:39:49 +00:00
Matt Arsenault acd68b58ae SelectionDAG: Support Expand of f16 extloads
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.

This moves a hack out of SI's argument lowering and
is covered by existing tests.

llvm-svn: 247113
2015-09-09 01:12:27 +00:00
Matt Arsenault 86d336e91b AMDGPU/SI: Fix input vcc operand for VOP2b instructions
Adds vcc to output string input for e32. Allows option
of using e64 encoding with assembler.

Also fixes these instructions not implicitly reading exec.

llvm-svn: 247074
2015-09-08 21:15:00 +00:00
Matt Arsenault 8ac35cd031 AMDGPU: Mark s_barrier as a high latency instruction
These were marked as WriteSALU, which is low latency.
I'm guessing at the value to use, but it should probably
be considered the highest latency instruction.

I'm not sure this has any actual effect since hasSideEffects
probably is preventing any moving of these.

llvm-svn: 247060
2015-09-08 19:54:32 +00:00
Matt Arsenault 8fb810a1d2 AMDGPU: Fix s_barrier flags
This should be convergent. This is not a
barrier in the isBarrier sense, nor
hasCtrlDep.

llvm-svn: 247059
2015-09-08 19:54:25 +00:00
Matt Arsenault 966a94f861 AMDGPU: Handle sub of constant for DS offset folding
sub C, x - > add (sub 0, x), C for DS offsets.

This is mostly to fix regressions that show up when
SeparateConstOffsetFromGEP is enabled.

llvm-svn: 247054
2015-09-08 19:34:22 +00:00
Sanjay Patel ce74db9d8d check for fastness before merging in DAGCombiner::MergeConsecutiveStores()
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned 
16-byte (SSE) memops for x86 targets where those accesses are slow.

This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905

and will help solve PR21711.

Differential Revision: http://reviews.llvm.org/D12573

llvm-svn: 246771
2015-09-03 15:03:19 +00:00
Matt Arsenault 51d2d0f668 AMDGPU: Fix adding redundant implicit operands
These are already added during the MachineInstr construction,
so this was adding the implicit registers twice.

llvm-svn: 246525
2015-09-01 02:02:21 +00:00
Matt Arsenault e4d0c142e8 AMDGPU: Add sdst operand to VOP2b instructions
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.

llvm-svn: 246358
2015-08-29 07:16:50 +00:00
Matt Arsenault 9a32cd3d3b AMDGPU: Set mem operands for spill instructions
llvm-svn: 246357
2015-08-29 06:48:57 +00:00
Matt Arsenault 5c004a7c61 AMDGPU: Fix dropping mem operands when moving to VALU
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.

We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.

llvm-svn: 246356
2015-08-29 06:48:46 +00:00
Tom Stellard eea72ccbf2 AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediates
Summary:
We were assuming tha if the use operand had a sub-register that
the immediate was 64-bits, but this was breaking the case of
folding a 64-bit immediate into another 64-bit instruction.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12255

llvm-svn: 246354
2015-08-29 01:58:21 +00:00
Tom Stellard b8ce14c4c3 AMDGPU/SI: Factor operand folding code into its own function
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12254

llvm-svn: 246353
2015-08-28 23:45:19 +00:00
Matt Arsenault 8a067121f8 AMDGPU: Delete dead code
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.

The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.

Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.

llvm-svn: 246080
2015-08-26 20:48:08 +00:00
Matt Arsenault 5e7f95e567 AMDGPU: Don't reprocess instructions when splitting i64 bcnt
llvm-svn: 246079
2015-08-26 20:48:04 +00:00
Matt Arsenault 445833cc91 AMDGPU: Fix not moving users of s_bfe_i64 to VALU
This wouldn't propagate to users of the original BFE
and would hit a verifier error.

llvm-svn: 246078
2015-08-26 20:47:58 +00:00
Matt Arsenault f003c38e1e AMDGPU: Don't create intermediate SALU instructions
When splitting 64-bit operations, create the correct
VALU instructions immediately.

This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.

llvm-svn: 246077
2015-08-26 20:47:50 +00:00
Matt Arsenault 602a16d3db AMDGPU/SI: Report SIFixSGPRLiveRanges changed function
llvm-svn: 246056
2015-08-26 19:12:03 +00:00
Matt Arsenault bd66061db7 AMDGPU: Make sure to reserve super registers
I think this could potentially have broken if
one of the super registers were allocated
that contain v254/v255.

llvm-svn: 246051
2015-08-26 18:54:50 +00:00
Matt Arsenault 19c5488015 AMDGPU: Produce error on dynamic_stackalloc
llvm-svn: 246048
2015-08-26 18:37:13 +00:00
Matt Arsenault 0a3ac1be43 AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM
Although the basic s_load_* instructions happen to use the same
opcode, some of the special case SMRD instructions have
different opcodes.

llvm-svn: 245775
2015-08-22 00:54:31 +00:00
Matt Arsenault e8df879948 AMDGPU: Improve accuracy of instruction rates for some FP instructions
llvm-svn: 245774
2015-08-22 00:50:41 +00:00
Matt Arsenault 33010103b7 AMDGPU: Use DFS to avoid second loop over function
llvm-svn: 245772
2015-08-22 00:43:38 +00:00
Matt Arsenault c8d8e4ed76 AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges
llvm-svn: 245769
2015-08-22 00:19:34 +00:00
Matt Arsenault aba29d6ab1 AMDGPU: Improve debug printing in SIFixSGPRLiveRanges
llvm-svn: 245768
2015-08-22 00:19:25 +00:00
Matt Arsenault 6adf07a92e AMDGPU: Move CI instructions into CIInstructions.td
There are still a couple of CI patterns left in SIInstructions.

llvm-svn: 245767
2015-08-22 00:16:34 +00:00
Matt Arsenault f56872dc30 AMDGPU: Minor cleanups to help with f16 support
The main change is inverting the condition for the
operand class classes so that VT.Size == 16 uses VGPR_32
instead of 64.

llvm-svn: 245764
2015-08-21 23:49:51 +00:00
Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
Michael Kuperstein dcdab4cd3a [TLI] Refactor "is integer division cheap" queries.
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.

NFC.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245430
2015-08-19 11:17:59 +00:00
Matthias Braun d55bcf2646 MachineRegisterInfo: Introduce isPhysRegUsed()
This method checks whether a physical regiser or any of its aliases are
used in the function.

Using this function in SIRegisterInfo::findUnusedReg() should also fix
this reported failure:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html
http://reviews.llvm.org/rL242173#inline-533

The report doesn't come with a testcase and I don't know enough about
AMDGPU to create one myself.

llvm-svn: 245329
2015-08-18 18:54:27 +00:00
Yaron Keren 178c465223 Add missing include guard.
llvm-svn: 245173
2015-08-16 07:55:08 +00:00
Matt Arsenault 588732bd6e AMDGPU/SI: Only look at live out SGPR defs
When trying to fix SGPR live ranges, skip defs that are
killed in the same block as the def. I don't think
we need to worry about these cases as long as the
live ranges of the SGPRs in dominating blocks are
correct.

This reduces the number of elements the second
loop over the function needs to look at, and makes
it generally easier to understand. The second loop
also only considers if the live range is live
in to a block, which logically means it
must have been live out from another.

llvm-svn: 245150
2015-08-15 02:58:49 +00:00
James Y Knight 5567bafe93 Remove redundant TargetFrameLowering::getFrameIndexOffset virtual
function.

This was the same as getFrameIndexReference, but without the FrameReg
output.

Differential Revision: http://reviews.llvm.org/D12042

llvm-svn: 245148
2015-08-15 02:32:35 +00:00
Matt Arsenault 297ae311ce AMDGPU/SI: Fix printing useless info with amdhsa
The comments at the bottom would all report 0 if
amdhsa was used.

llvm-svn: 245135
2015-08-15 00:12:39 +00:00
Matt Arsenault 0259a7aa41 AMDGPU/SI: Update LiveVariables
This is simple but won't work if/when this pass
is moved to be post-SSA.

llvm-svn: 245134
2015-08-15 00:12:37 +00:00
Matt Arsenault 670ba46efe AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges
Does not mark SlotIndexes as reserved, although I think
that might be OK.

LiveVariables still need to be handled.

llvm-svn: 245133
2015-08-15 00:12:35 +00:00
Matt Arsenault b75233235c AMDGPU: Remove unnecessary assert
These shouldn't ever be null. The number of successors
was already asserted to be 2.

llvm-svn: 245132
2015-08-15 00:12:32 +00:00
Matt Arsenault 4275c29a02 AMDGPU/SI: Make comments more precise.
True branch instructions do behave as expected with liveness.

Avoid the phrasing "branch decision is based on a value in an SGPR"
because this could be misleading. A VALU compare instruction's
result is still based on an SGPR, even though that condition
may be divergent.

llvm-svn: 245131
2015-08-15 00:12:30 +00:00
Tom Stellard bef1094ee7 AMDGPU/SI: Add missing spill class
The compiler was failing to spill for some shaders.

Patch By: Axel Davy

llvm-svn: 245087
2015-08-14 19:46:05 +00:00
Simon Pilgrim 7218251861 [AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation
D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

Differential Revision: http://reviews.llvm.org/D12007

llvm-svn: 244960
2015-08-13 21:40:02 +00:00
Yaron Keren 556b21aa10 Remove and forbid raw_svector_ostream::flush() calls.
After r244870 flush() will only compare two null pointers and return,
doing nothing but wasting run time. The call is not required any more
as the stream and its SmallString are always in sync.

Thanks to David Blaikie for reviewing.

llvm-svn: 244928
2015-08-13 18:12:56 +00:00
Matt Arsenault c574686529 AMDGPU: Fix assert on dbg_value instructions
llvm-svn: 244728
2015-08-12 09:04:44 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Tom Stellard 30cf77457d AMDGPU/SI: Another attempt to fix Windows bots broken by r244372
llvm-svn: 244383
2015-08-08 01:11:07 +00:00
Matt Arsenault cbd753761a AMDGPU: Implement AMDGPUOperand::print()
llvm-svn: 244381
2015-08-08 00:41:51 +00:00
Matt Arsenault 4635915504 AMDGPU/SI: Remove VCCReg
llvm-svn: 244380
2015-08-08 00:41:48 +00:00
Matt Arsenault 6942d1a034 AMDGPU/SI: Remove source uses of VCCReg
llvm-svn: 244379
2015-08-08 00:41:45 +00:00
Tom Stellard fc70950bf2 AMDGPU/SI: Attempt to fix Windows bots broken by r244372
llvm-svn: 244376
2015-08-08 00:17:59 +00:00
Tom Stellard fd25395c72 AMDGPU: Add pass to lower OpenCL image and sampler arguments.
The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.

Patch by: Zoltan Gilian

llvm-svn: 244372
2015-08-07 23:19:30 +00:00