Commit Graph

208 Commits

Author SHA1 Message Date
Michel Danzer 8522270d7e R600/SI: Add pattern for xor of i1
Fixes two recent piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188559
2013-08-16 16:19:31 +00:00
Michel Danzer 20680b1cc5 R600/SI: Fix broken encoding of DS_WRITE_B32
The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.

Undo that damage and only apply the SMRD logic to that.

Fixes some derivates related piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558
2013-08-16 16:19:24 +00:00
Tom Stellard dba25713a6 Revert "R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions"
This reverts commit a6a39ced095c2f453624ce62c4aead25db41a18f.
This is the wrong version of this fix.

llvm-svn: 188523
2013-08-16 01:18:43 +00:00
Tom Stellard 82bef57f20 R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write).  This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.

llvm-svn: 188522
2013-08-16 01:12:20 +00:00
Tom Stellard b03edeca67 R600: Add support for global vector loads with element types less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188521
2013-08-16 01:12:16 +00:00
Tom Stellard fbab827e2a R600: Add support for global vector stores with elements less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188520
2013-08-16 01:12:11 +00:00
Tom Stellard d3ee8c103a R600: Add support for i16 and i8 global stores
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188519
2013-08-16 01:12:06 +00:00
Tom Stellard 6d1379e180 R600: Add support for v4i32 stores on Cayman
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188518
2013-08-16 01:12:00 +00:00
Tom Stellard 16da74c205 R600: Enable folding of inline literals into REQ_SEQUENCE instructions
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188517
2013-08-16 01:11:55 +00:00
Tom Stellard ac00f9df79 R600: Change the RAT instruction assembly names so they match the docs
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188515
2013-08-16 01:11:46 +00:00
Daniel Dunbar 9efbedfd35 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Tom Stellard d86003e31f R600/SI: Improve legalization of vector operations
This should fix hangs in the OpenCL piglit tests.

llvm-svn: 188431
2013-08-14 23:25:00 +00:00
Tom Stellard 6785065ace R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsics
llvm-svn: 188430
2013-08-14 23:24:53 +00:00
Tom Stellard 9fa1791a1b R600/SI: Convert v16i8 resource descriptors to i128
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.

This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:

https://bugs.freedesktop.org/show_bug.cgi?id=66805

llvm-svn: 188429
2013-08-14 23:24:45 +00:00
Tom Stellard b81df0c7ea R600/SI: Use i8 types for resource descriptors in tests
We switched from i32 to i8 types a while ago and the tests were never
updated.

llvm-svn: 188428
2013-08-14 23:24:37 +00:00
Tom Stellard 8e5da41374 R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.

v2:
  - Use an SGPR register class if all the operands of BUILD_VECTOR are
    SGPRs.

llvm-svn: 188427
2013-08-14 23:24:32 +00:00
Tom Stellard 16a9a205c8 R600/SI: Assign a register class to the $vaddr operand for MIMG instructions
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.

llvm-svn: 188425
2013-08-14 23:24:17 +00:00
Tom Stellard 3494b7ee42 R600/SI: Handle MSAA texture targets
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188421
2013-08-14 22:22:14 +00:00
Tom Stellard 20ee94f152 R600/SI: Allow conversion between v32i8 and v8i32
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188420
2013-08-14 22:22:09 +00:00
Tom Stellard 73c31d541e R600/SI: Add pattern for fp_to_uint
This fixes the F2U opcode for the Mesa driver.

Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188418
2013-08-14 22:21:57 +00:00
Tom Stellard fc455471c3 R600: Set scheduling preference to Sched::Source
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler.  Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:

With Sched::VLIW

Total Compile Time:                  1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)

With Sched::Source

Total Compile Time:                  0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)

The code ouput was identical with both schedulers.  This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.

llvm-svn: 188215
2013-08-12 22:33:21 +00:00
Niels Ole Salscheider d3a039fed2 R600/SI: FMA is faster than fmul and fadd for f64
llvm-svn: 188136
2013-08-10 10:38:54 +00:00
Niels Ole Salscheider 6509ac65a9 R600/SI: Add FMA pattern
llvm-svn: 188135
2013-08-10 10:38:47 +00:00
Niels Ole Salscheider 719fbc9ae7 R600/SI: Implement fp32<->fp64 conversions
llvm-svn: 187988
2013-08-08 16:06:15 +00:00
Niels Ole Salscheider 4715d886f8 R600/SI: Implement sint<->fp64 conversions
llvm-svn: 187987
2013-08-08 16:06:08 +00:00
Tom Stellard 2f7cdda57e R600/SI: Use VSrc_* register classes as the default classes for types
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:

SGPR = COPY VGPR

Will now be emitted like this:

VSrC = COPY VGPR

This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur.  Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.

llvm-svn: 187831
2013-08-06 23:08:28 +00:00
Tom Stellard 4c0ffccbbf R600/SI: Add more special cases for opcodes to ensureSRegLimit()
Also factor out the register class lookup to its own function.

llvm-svn: 187830
2013-08-06 23:08:18 +00:00
Tom Stellard aa664d9b92 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

llvm-svn: 187764
2013-08-06 02:43:45 +00:00
Tom Stellard eef2ad92c7 R600/SI: Add missing test for r187749
llvm-svn: 187754
2013-08-05 22:45:56 +00:00
Tom Stellard 0344cdfe39 R600: Add 64-bit float load/store support
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions

Tom Stellard:
  - Mark vec2 operations as expand.  The addition of a vec2 register
    class made them all legal.

Patch by: Dmitry Cherkassov

Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
2013-08-01 15:23:42 +00:00
Tom Stellard 53698938a4 R600: Use 64-bit alignment for 64-bit kernel arguments
llvm-svn: 187581
2013-08-01 15:23:31 +00:00
Tom Stellard 98f675a994 R600/SI: Custom lower i64 ZERO_EXTEND
llvm-svn: 187580
2013-08-01 15:23:26 +00:00
Tom Stellard ca69a53bae Revert "R600: Non vector only instruction can be scheduled on trans unit"
This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6.

llvm-svn: 187526
2013-07-31 20:43:27 +00:00
Vincent Lejeune bb3f931123 R600: Avoid more than 4 literals in the same instruction group at scheduling
llvm-svn: 187515
2013-07-31 19:32:07 +00:00
Vincent Lejeune df18804e26 R600: Non vector only instruction can be scheduled on trans unit
llvm-svn: 187514
2013-07-31 19:31:56 +00:00
Tom Stellard aa313d0a74 R600/SI: Expand vector fp <-> int conversions
llvm-svn: 187421
2013-07-30 14:31:03 +00:00
Quentin Colombet e2e0548d77 [R600] Replicate old DAGCombiner behavior in target specific DAG combine.
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.

llvm-svn: 187397
2013-07-30 00:27:16 +00:00
Quentin Colombet 6bf4baa408 [DAGCombiner] insert_vector_elt: Avoid building a vector twice.
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN 

The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
  vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
  optimizations.

llvm-svn: 187396
2013-07-30 00:24:09 +00:00
Tom Stellard c54731aa9d DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)Free
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.

llvm-svn: 187007
2013-07-23 23:55:03 +00:00
Tom Stellard 8cb0e47c9e R600: Treat CONSTANT_ADDRESS loads like GLOBAL_ADDRESS loads when necessary
These are really the same address space in hardware.  The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access.  When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.

llvm-svn: 187006
2013-07-23 23:54:56 +00:00
Tom Stellard 5263948a7b R600: Add support for 24-bit MAD instructions
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186923
2013-07-23 01:48:49 +00:00
Tom Stellard 41fc7853be R600: Add support for 24-bit MUL instructions
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186922
2013-07-23 01:48:42 +00:00
Tom Stellard 9f95033d33 R600: Improve support for < 32-bit loads
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186921
2013-07-23 01:48:35 +00:00
Tom Stellard 840214437b R600: Move CONST_ADDRESS folding into AMDGPUDAGToDAGISel::Select()
This increases the number of opportunites we have for folding.  With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
2013-07-23 01:48:24 +00:00
Tom Stellard 1e80309ebe R600: Use KCache for kernel arguments
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186918
2013-07-23 01:48:18 +00:00
Tom Stellard acfeebf883 R600: Use the same compute kernel calling convention for all GPUs
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
2013-07-23 01:48:05 +00:00
Tom Stellard 78e012969c R600: Use correct LoadExtType when lowering kernel arguments
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186915
2013-07-23 01:47:58 +00:00
Tom Stellard 33dd04bfbe R600: Clean up extended load patterns
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186914
2013-07-23 01:47:52 +00:00
Tom Stellard beed74af48 R600: Expand vector FNEG
llvm-svn: 186913
2013-07-23 01:47:46 +00:00
Vincent Lejeune 8b8a7b5514 R600: Don't emit empty then clause and use alu_pop_after
llvm-svn: 186725
2013-07-19 21:45:15 +00:00