Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations. When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17102
llvm-svn: 260599
Summary:
Also delete all the stub functions that are identical to the
implementations in TargetInstrInfo.cpp.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16609
llvm-svn: 259054
Summary:
It is off by default, but can be used
with --misched=si
Patch by: Axel Davy
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D11885
llvm-svn: 257609
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.
I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15542
llvm-svn: 255906
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.
This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.
llvm-svn: 254452
v2: added more tests, moved the SALU->VALU conversion to a separate function
It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.
llvm-svn: 254095
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.
We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.
For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).
The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand. However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.
llvm-svn: 250951
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.
This is a candidate for stable llvm 3.7.
Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.
llvm-svn: 248627
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.
The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.
Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.
llvm-svn: 246080
When splitting 64-bit operations, create the correct
VALU instructions immediately.
This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.
llvm-svn: 246077
This commit fixes a bug in the class 'SIInstrInfo' where the implicit register
machine operands were added to a machine instruction in an incorrect order -
the implicit uses were added before the implicit defs.
I found this bug while working on moving the implicit register operand
verification code from the MIR parser to the machine verifier.
This commit also makes the method 'addImplicitDefUseOperands' in the machine
instruction class public so that it can be reused in the 'SIInstrInfo' class.
Reviewers: Matt Arsenault
Differential Revision: http://reviews.llvm.org/D11689
llvm-svn: 243799
Summary:
This function is never called. isReallyTriviallyReMaterializable() is
the function that should be implemented instead.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11620
llvm-svn: 243651
The two-address instruction pass will convert these back to v_mad_f32
if necessary.
Differential Revision: http://reviews.llvm.org/D11060
llvm-svn: 242038
If pseudoToMCOpcode failed, we would return the original opcode, so operands
would be swapped, but the instruction would remain the same.
It resulted in LSHLREV a, b ---> LSHLREV b, a.
This fixes Glamor text rendering and
piglit/arb_sample_shading-builtin-gl-sample-mask on VI.
This is a candidate for stable branches.
v2: the test was simplified by Tom Stellard
llvm-svn: 240824
Summary:
TargetInstrInfo::getLdStBaseRegImmOfs to
TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86. The
implementation only handles a few easy cases now and will be made more
sophisticated in the future.
This is NFCI: the only user of `getLdStBaseRegImmOfs` (now
`getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion`
is disabled for x86.
Reviewers: reames, ab, MatzeB, atrick
Reviewed By: MatzeB, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10199
llvm-svn: 239741