Commit Graph

584 Commits

Author SHA1 Message Date
Matt Arsenault 7b46a59b5a R600/SI: Relax a few tests to help enable scheduler
llvm-svn: 217320
2014-09-06 20:44:41 +00:00
Matt Arsenault a9fcf62a9c R600/SI: Fix broken check lines.
Fix missing check, and hardcoded register numbers.

llvm-svn: 217318
2014-09-06 20:37:56 +00:00
Matt Arsenault 8ae5961065 R600/SI: Use same complex patterns for DS atomics
This fixes hitting the same negative base offset problem
that was already fixed for regular loads and stores.

llvm-svn: 217256
2014-09-05 16:24:58 +00:00
Jan Vesely d1d1334064 R600: Fix FROUND
round halfway cases away from zero

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 217250
2014-09-05 14:26:54 +00:00
Tom Stellard 80942a1b50 R600/SI: Use S_ADD_U32 and S_SUB_U32 for low half of 64-bit operations
https://bugs.freedesktop.org/show_bug.cgi?id=83416

llvm-svn: 217248
2014-09-05 14:07:59 +00:00
Matt Arsenault 869cd07158 R600/SI: Try to keep i32 mul on SALU
Also fix bug this exposed where when legalizing an immediate
operand, a v_mov_b32 would be created with a VSrc dest register.

llvm-svn: 217108
2014-09-03 23:24:35 +00:00
Tom Stellard 102c68786c R600/SI: Add a pattern for i64 and in a branch
llvm-svn: 217041
2014-09-03 15:22:41 +00:00
Matt Arsenault 4c24d73709 R600/SI: Relax some ordering in tests.
This will help with enabling misched

llvm-svn: 216971
2014-09-02 21:45:50 +00:00
Matt Arsenault b78875e979 R600/SI: Fix hardcoded register numbers in test
llvm-svn: 216944
2014-09-02 20:43:07 +00:00
Matt Arsenault d1649db2fc R600/SI: Add failing testcase.
This is broken when 64-bit add is only partially
moved to the VALU.

llvm-svn: 216933
2014-09-02 19:12:31 +00:00
Matt Arsenault c1a71217b3 Fix interference caused by fmul 2, x -> fadd x, x
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.

llvm-svn: 216932
2014-09-02 19:02:53 +00:00
Matt Arsenault 8675db15da R600/SI: Use mad for fsub + fmul
We can use a negate source modifier to match
this for fsub.

llvm-svn: 216735
2014-08-29 16:01:14 +00:00
Tom Stellard f3fc555e3b R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment
llvm-svn: 216279
2014-08-22 18:49:35 +00:00
Tom Stellard 85e8b6d5f9 R600/SI: Use a ComplexPattern for DS loads and stores
llvm-svn: 216278
2014-08-22 18:49:33 +00:00
Tom Stellard 745f2eddef R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions
llvm-svn: 216220
2014-08-21 20:41:00 +00:00
Tom Stellard 162a947160 R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function
This fixes a crash in an ocl conformance test.

llvm-svn: 216219
2014-08-21 20:40:58 +00:00
Matt Arsenault fabf545299 R600/SI: Move all fabs / fneg handling to patterns
llvm-svn: 215749
2014-08-15 18:42:22 +00:00
Matt Arsenault 13623d0e28 R600/SI: Use source modifiers for f64 fneg
llvm-svn: 215748
2014-08-15 18:42:18 +00:00
Matt Arsenault a147438e37 R600/SI: Use source modifier for f64 fabs
llvm-svn: 215747
2014-08-15 18:42:15 +00:00
Matt Arsenault b2baffaffd R600/SI: Fix offset folding in some cases with shifted pointers.
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.

This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.

llvm-svn: 215739
2014-08-15 17:49:05 +00:00
Matt Arsenault 2e7cc48baa R600/SI: Add intrinsic for ldexp
llvm-svn: 215734
2014-08-15 17:30:25 +00:00
Matt Arsenault 5015a89aa5 R600/SI: Implement isLegalAddressingMode
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.

More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.

llvm-svn: 215732
2014-08-15 17:17:07 +00:00
Matt Arsenault 74ef277774 R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

llvm-svn: 215564
2014-08-13 18:14:11 +00:00
Hal Finkel 415e344f29 Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID
Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and
INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are
used for memory intrinsics (those with MachineMemOperands), and sometimes not.
When not, the nodes are not created as instances of MemIntrinsicSDNode, but
rather created as some other subclass of SDNode using DAG::getNode. When they
are memory intrinsics, they are created using DAG::getMemIntrinsicNode as
instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of
MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby
MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would
return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID
would return false. In r214452, MemSDNode::classof was changed to return true
for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The
problem is that neither the pre-r214452 logic and the post-r214452 logic are
really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID
nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and
the return value from classof needs to reflect that. This was broken before
r214452 (because MemIntrinsicSDNode::classof always returned true), and was
broken afterward (because MemSDNode::classof also always returned true), and
will now be correct.

The minimal solution is to grab one of the SubclassData bits (there is one left
for MemIntrinsicSDNode nodes) and use it to store whether or not a particular
INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of
MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof
and MemSDNode::classof to return the correct answer for the underlying object
for both the memory-intrinsic and non-memory-intrinsic cases.

This fixes the problem that r214452 created in the SelectionDAGDumper (thanks
to Matt Arsenault for pointing it out).

Because PowerPC does not implement getTgtMemIntrinsic, this change breaks
test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will
fix it in a follow-up commit.

llvm-svn: 215511
2014-08-13 01:15:37 +00:00
Jan Vesely e5ca27d716 R600: Use optimized 24bit path in udivrem
v2: drop enum keyword
    use correct extension mode
    don't bother computing the sign in unsinged case

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215462
2014-08-12 17:31:20 +00:00
Jan Vesely 4a33bc6206 R600: Use i24 optimized path for SREM
v2: add tests
    rename LowerSDIV24 to LowerSDIVREM24
    handle the rem part in this function

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215460
2014-08-12 17:31:17 +00:00
Tom Stellard 155bbb7713 R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant
This saves us from having to copy a 64-bit 0 value into VGPRs for
BUFFER_* instruction which only have a 12-bit immediate offset.

llvm-svn: 215399
2014-08-11 22:18:17 +00:00
Tom Stellard 48194cdd81 R600/SI: Add check for low 32 bits of encoding to mubuf tests
There are no variable values like registers encoded in the low 32 bits of MUBUF
instructions, so it is relatively easy to check these bits, and it will
help prevent us from introducing encoding bugs.

llvm-svn: 215397
2014-08-11 22:18:11 +00:00
Tom Stellard 93ba12f163 R600/SI: Clear lds bit on MUBUF instructions used for private stores
This bit was left uninitialized, which was causing some random failures
of piglit tests.

NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 215396
2014-08-11 22:18:09 +00:00
Tom Stellard e04fd9d430 R600/SI: Fix broken test
llvm-svn: 215395
2014-08-11 22:18:05 +00:00
Tom Stellard c0503db9e2 R600/SI: Custom lower CONCAT_VECTORS
This will lower them using register copies rather than loads and stores
to the stack.

llvm-svn: 215270
2014-08-09 01:06:56 +00:00
Tom Stellard 4f575f7aaf R600/SI: Update concat_vectors.ll to check for scratch usage
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.

With this change I was able to uncover one broken test which will
be fixed in a future commit.

llvm-svn: 215269
2014-08-09 01:06:53 +00:00
Matt Arsenault a6dc6c281c R600: Cleanup fadd and fsub tests
llvm-svn: 214991
2014-08-06 20:27:55 +00:00
Matt Arsenault d5f4de27b6 R600: Increase nearby load scheduling threshold.
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.

llvm-svn: 214943
2014-08-06 00:29:49 +00:00
Matt Arsenault c10853f29f R600/SI: Implement areLoadsFromSameBasePtr
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.

llvm-svn: 214942
2014-08-06 00:29:43 +00:00
Tom Stellard 229d5e669b R600/SI: Update MUBUF assembly string to match AMD proprietary compiler
llvm-svn: 214866
2014-08-05 14:48:12 +00:00
Tom Stellard b37f797678 R600/SI: Avoid generating REGISTER_LOAD instructions.
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.

llvm-svn: 214865
2014-08-05 14:40:52 +00:00
Matt Arsenault 9215b17eb7 R600/SI: Fix extra whitespace in asm str
This slipped in in r214467, so something like

V_MOV_B32_e32  v0, ... is now printed with 2 spaces
between the instruction name and first operand.

llvm-svn: 214660
2014-08-03 05:27:14 +00:00
Matt Arsenault 4de324442b R600: Cleanup fneg tests
llvm-svn: 214612
2014-08-02 02:26:51 +00:00
Tom Stellard 4973a13680 Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp"
This reverts commit r214566.

I did not mean to commit this yet.

llvm-svn: 214572
2014-08-01 21:55:50 +00:00
Tom Stellard c16f73d7c5 R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.

llvm-svn: 214566
2014-08-01 21:50:47 +00:00
Matt Arsenault 06bd3933ba R600: Cleanup test
Remove -CHECKs, use multiple prefixes, name values,
also test the @llvm.fabs version

llvm-svn: 214525
2014-08-01 17:00:29 +00:00
Tom Stellard b4a313a76f R600/SI: Do abs/neg folding with ComplexPatterns
Abs/neg folding has moved out of foldOperands and into the instruction
selection phase using complex patterns.  As a consequence of this
change, we now prefer to select the 64-bit encoding for most
instructions and the modifier operands have been dropped from
integer VOP3 instructions.

llvm-svn: 214467
2014-08-01 00:32:39 +00:00
Tom Stellard 6407e1e632 R600/SI: Fold immediates when shrinking instructions
This will prevent us from using extra MOV instructions once we prefer
selecting 64-bit instructions.

llvm-svn: 214464
2014-08-01 00:32:33 +00:00
Tom Stellard 86d12ebdbd R600/SI: Fix incorrect commute operation in shrink instructions pass
We were commuting the instruction by still shrinking it using the
original opcode.

NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 214463
2014-08-01 00:32:28 +00:00
Jan Vesely 3047950964 R600: Modernize work item intrinsics test
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 214451
2014-07-31 22:11:03 +00:00
Matt Arsenault 2b252ecf6b R600: Modernize test
llvm-svn: 214108
2014-07-28 18:06:08 +00:00
Matt Arsenault 46645fa102 R600/SI: Implement getOptimalMemOpType
The default guess uses i32. This needs an address space argument
to really do the right thing in all cases.

llvm-svn: 214104
2014-07-28 17:49:26 +00:00
Matt Arsenault 6f2a526101 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

llvm-svn: 214055
2014-07-27 17:46:40 +00:00
Matt Arsenault 24aa028cfa R600/SI: Fix broken test.
There was no check prefix for the instruction lines.
Match what is emitted though, although I'm pretty sure it is
incorrect.

llvm-svn: 214035
2014-07-26 21:21:42 +00:00