Commit Graph

1047 Commits

Author SHA1 Message Date
Tom Stellard 11624bc577 R600/SI: Add a MUBUF load pattern for Reg+Imm offsets
llvm-svn: 200933
2014-02-06 18:36:38 +00:00
Tom Stellard 044e418f15 R600/SI: Use immediates offsets for SMRD instructions whenever possible
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.

llvm-svn: 200932
2014-02-06 18:36:34 +00:00
Matt Arsenault 25793a3f22 Add address space argument to allowsUnalignedMemoryAccess.
On R600, some address spaces have more strict alignment
requirements than others.

llvm-svn: 200887
2014-02-05 23:15:53 +00:00
Michel Danzer 5d26fdfcba R600/SI: Add pattern for zero-extending i1 to i32
Fixes opencl-example if_* tests with radeonsi.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830
2014-02-05 09:48:05 +00:00
Duncan P. N. Exon Smith 8e661efc00 cleanup: scc_iterator consumers should use isAtEnd
No functional change.  Updated loops from:

    for (I = scc_begin(), E = scc_end(); I != E; ++I)

to:

    for (I = scc_begin(); !I.isAtEnd(); ++I)

for teh win.

llvm-svn: 200789
2014-02-04 19:19:07 +00:00
Rafael Espindola 7cbbd28c67 Every target uses .align. Simplify.
llvm-svn: 200782
2014-02-04 18:39:51 +00:00
Tom Stellard aeb456438c R600/SI: Expand i1 BR_CC
This fixes a crashes in the OpenCV test suite and also the scrypt
kernel in bfgminer.

I was unable to come up with a reduced test case for this.

https://bugs.freedesktop.org/show_bug.cgi?id=72785

llvm-svn: 200776
2014-02-04 17:18:43 +00:00
Tom Stellard b8725d84d6 R600/SI: Don't assume copies will be coalesced in SIFixSGPRCopies
There is no lit test for this, because it would be too big and
complicated, but it does fix a crash in the Arithm/Absdiff.* OpenCV test.

llvm-svn: 200775
2014-02-04 17:18:42 +00:00
Tom Stellard 0ec134f3d6 R600/SI: Custom lower i64 ISD::SELECT
llvm-svn: 200774
2014-02-04 17:18:40 +00:00
Tom Stellard bfebd1fc7e R600: Enable vector fpow.
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."

Patch by: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773
2014-02-04 17:18:37 +00:00
Michel Danzer 624b02aa67 R600/SI: Fix fneg for 0.0
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.

Also add a pattern for (fneg (fabs ...)).

Fixes a bunch of bit encoding piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200743
2014-02-04 07:12:38 +00:00
Matt Arsenault d5ab971b54 Add DEBUG_TYPE to SIAnnotateControlFlow
llvm-svn: 200720
2014-02-03 22:58:05 +00:00
Matt Arsenault f5958dded4 R600/SI: Fix insertelement with dynamic indices.
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.

llvm-svn: 200619
2014-02-02 00:05:35 +00:00
Rafael Espindola 277f9061fc Remove the last hasRawTextSupport call from R600.
There is nothing wrong with printing the disassembly section when printing
text. An hypothetical assembler would then produce a .o just like our
direct object emission produces.

llvm-svn: 200583
2014-01-31 22:14:06 +00:00
Rafael Espindola 887541fe27 Replace another use with hasRawTextSupport+EmitRawText with emitRawComment.
llvm-svn: 200582
2014-01-31 22:08:19 +00:00
Rafael Espindola 19656ba7ea Use emitRawComment to avoid a call to hasRawTextSupport.
llvm-svn: 200581
2014-01-31 21:54:49 +00:00
David Woodhouse d2cca113df Delete MCSubtargetInfo data members from target MCCodeEmitter classes
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.

llvm-svn: 200350
2014-01-28 23:13:25 +00:00
David Woodhouse 3fa98a65e9 Propagate MCSubtargetInfo through TableGen's getBinaryCodeForInstr()
llvm-svn: 200349
2014-01-28 23:13:18 +00:00
David Woodhouse 9784cef38d Explictly pass MCSubtargetInfo to MCCodeEmitter::EncodeInstruction()
llvm-svn: 200348
2014-01-28 23:13:07 +00:00
David Woodhouse e6c13e4abd Change MCStreamer EmitInstruction interface to take subtarget info
llvm-svn: 200345
2014-01-28 23:12:42 +00:00
Michel Danzer bf1a641060 R600/SI: Add pattern for truncating i32 to i1
Fixes half a dozen piglit tests with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200283
2014-01-28 03:01:16 +00:00
Michel Danzer 13736221e3 R600/SI: Add intrinsic for BUFFER_LOAD_DWORD* instructions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
2014-01-27 07:20:51 +00:00
Michel Danzer 6064f57ae8 R600/SI: Add intrinsic for S_SENDMSG instruction
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
2014-01-27 07:20:44 +00:00
Rafael Espindola 98f5b54f85 Add back spaces I missed in the conversion to emitRawComments.
Sorry about that.

llvm-svn: 200171
2014-01-27 00:19:41 +00:00
Rafael Espindola bcf890bf07 Use emitRawComment instead of EmitRawText.
llvm-svn: 200170
2014-01-27 00:16:00 +00:00
Rafael Espindola e41383f899 Pass a MCSubtargetInfo down to the TargetStreamer creation.
With this the target streamers will be able to know the target features that
are in use.

llvm-svn: 200135
2014-01-26 06:38:58 +00:00
Rafael Espindola 24ea09ef7d Construct the MCStreamer before constructing the MCTargetStreamer.
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
  reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.

llvm-svn: 200129
2014-01-26 06:06:37 +00:00
Juergen Ributzka 3e752e7af9 Add final and owerride keywords to TargetTransformInfo's subclasses.
llvm-svn: 200021
2014-01-24 18:22:59 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Tom Stellard a64353e5bd R600: Remove successive JUMP in AnalyzeBranch when AllowModify is true
This fixes a crash in the OpenCV OpenCL test suite.

There is no lit test for this, because the test would be very large
and could easily be invalidated by changes to the scheduler
or other parts of the compiler.

Patch by:  Vincent Lejeune

llvm-svn: 199919
2014-01-23 18:49:34 +00:00
Tom Stellard a2a4b8ee2f R600: Disable the BFE pattern
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason.  I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.

llvm-svn: 199918
2014-01-23 18:49:33 +00:00
Tom Stellard 805890b252 R600: Correctly handle vertex fetch clauses the precede ENDIFs
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.

llvm-svn: 199917
2014-01-23 18:49:31 +00:00
Tom Stellard 8cce9bdf17 R600: Unconditionally unroll loops that contain GEPs with alloca pointers
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.

Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.

llvm-svn: 199916
2014-01-23 18:49:28 +00:00
Tom Stellard 348273df97 R600: Recommit 199842: Add work-around for the CF stack entry HW bug
The unit test is now disabled on non-asserts builds.

The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)

We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.

reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199905
2014-01-23 16:18:02 +00:00
Tom Stellard 31e16388d7 Revert "R600: Add work-around for the CF stack entry HW bug"
This reverts commit 35b8331cad6eb512a2506adbc394201181da94ba.

The -debug-only flag for llc doesn't appear to be available in
all build configurations.

llvm-svn: 199845
2014-01-22 22:20:54 +00:00
Tom Stellard e89373e062 R600: Add work-around for the CF stack entry HW bug
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)

We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.

reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199842
2014-01-22 21:55:46 +00:00
Tom Stellard 59ed4794c4 R600: Add some missing CF instruction definitions to the .td files.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199841
2014-01-22 21:55:44 +00:00
Tom Stellard a40f97154b R600: Refactor stack size calculation
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199840
2014-01-22 21:55:43 +00:00
Tom Stellard afbb697e0b R600: CF_PUSH is the same on Evergreen and Cayman
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199839
2014-01-22 21:55:41 +00:00
Tom Stellard 8c347b024e R600: Add wavefront size property to the subtargets v2
v2:
  - Initialize wavefront size to 0

reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199838
2014-01-22 21:55:40 +00:00
Tom Stellard 08b6af91c3 R600: Add stack size to .AMDGPUcsdata section
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199837
2014-01-22 21:55:35 +00:00
Tom Stellard 476437cbbc R600: MOVA is vector only
llvm-svn: 199827
2014-01-22 19:24:24 +00:00
Tom Stellard 598f3945c0 R600: Take alignment into account when calculating the stack offset
llvm-svn: 199826
2014-01-22 19:24:23 +00:00
Tom Stellard 04c0e9851b R600: Add support for global addresses with constant initializers
llvm-svn: 199825
2014-01-22 19:24:21 +00:00
Tom Stellard 27982b1d4a R600: Begin private memory at the second GPR.
This way private memory does not over-write work group information
stored in GPRs 0 and 1.

llvm-svn: 199824
2014-01-22 19:24:19 +00:00
Tom Stellard e93736057f R600/SI: Add support for i8 and i16 private loads/stores
llvm-svn: 199823
2014-01-22 19:24:14 +00:00
Rafael Espindola f69b850d60 CommentColumn is always 40. Simplify.
llvm-svn: 199357
2014-01-16 07:04:11 +00:00
Chandler Carruth 73523021d0 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth e509db410a [PM] Pull the generic graph algorithms and data structures for dominator
trees into the Support library.

These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.

So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.

The yak shaving will continue...

llvm-svn: 199095
2014-01-13 10:52:56 +00:00
Chandler Carruth 5ad5f15cff [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Matt Arsenault a64ee177a0 Move declaration of variables down to first use.
llvm-svn: 198794
2014-01-08 21:47:14 +00:00
Chandler Carruth 8a8cd2bab9 Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Andrew Trick d7f890edb0 Factor MI-Sched in preparation for post-ra scheduling support.
Factor the MachineFunctionPass into MachineSchedulerBase.

Split the DAG class into ScheduleDAGMI and SchedulerDAGMILive.

llvm-svn: 198119
2013-12-28 21:56:47 +00:00
Tom Stellard eddfa69465 R600: Allow ftrunc
v2: Add ftrunc->TRUNC pattern instead of replacing int_AMDGPU_trunc
v3: move ftrunc pattern next to TRUNC definition, it's available since R600

Patch By: Jan Vesely

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 197783
2013-12-20 05:11:55 +00:00
Rafael Espindola 4fa79758b7 Small simplification, p0 is the same as p.
llvm-svn: 197699
2013-12-19 16:51:03 +00:00
Matt Arsenault a98cd6a56e R600/SI: Make private pointers be 32-bit.
Different sized address spaces should theoretically work
most of the time now, and since 64-bit add is currently
disabled, using more 32-bit pointers fixes some cases.

llvm-svn: 197659
2013-12-19 05:32:55 +00:00
Andrew Trick e339828b90 Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.

This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:

     %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
     %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
     %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>

Test case: cse-add-with-overflow.ll.

This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.

llvm-svn: 197465
2013-12-17 04:50:45 +00:00
Matt Arsenault cb34f84e39 Fix typo in instruction name.
SI_KIL -> SI_KILL

llvm-svn: 197425
2013-12-16 20:58:33 +00:00
Rafael Espindola e89b41495a One last cleanup of LLVM's DataLayout strings.
Produce them in the same order on every target. The order is that of
getStringRepresentation: e|E-i*-f*-v*-a*-s*-n*-S*.

llvm-svn: 197411
2013-12-16 19:31:14 +00:00
Rafael Espindola 0eb1ebeaac Structure R600's computeDataLayout more like every other target.
While there, simplify "p3:32:32:32" to "p3:32:32".

llvm-svn: 197407
2013-12-16 19:18:57 +00:00
Rafael Espindola bccb9d45ad The preferred alignment defaults to the abi alignment. Omit if it is the same.
llvm-svn: 197400
2013-12-16 18:01:51 +00:00
Rafael Espindola f057093fdc Don't duplicate the DataLayout defaults for integer, floats and vectors.
llvm-svn: 197398
2013-12-16 17:41:15 +00:00
Rafael Espindola 8afbb28cea On DataLayout, omit the default of p:64:64:64.
llvm-svn: 197397
2013-12-16 17:15:29 +00:00
Matt Arsenault 52226f9a8e Don't manually calculate size in bytes
llvm-svn: 197327
2013-12-14 18:21:59 +00:00
Rafael Espindola ceb0c4962a Turn AMDGPUSubtarget::getDataLayout into a static function.
No functionality change.

llvm-svn: 197310
2013-12-14 06:13:44 +00:00
Rafael Espindola 009e758628 Don't set unused variable.
llvm-svn: 197064
2013-12-11 20:40:57 +00:00
Tom Stellard d7e146ede6 R600: Re-format Processors.td
This makes it a little easier to read.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197058
2013-12-11 17:51:51 +00:00
Tom Stellard f2ba972af6 R600: Register AMDGPUCFGStructurizer pass
This enables -print-before-all to dump MachineInstrs after it is run.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197057
2013-12-11 17:51:47 +00:00
Tom Stellard 1de5582d06 R600: Register R600EmitClauseMarkers pass
This enables -print-before-all to dump MachineInstrs after it is run.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197056
2013-12-11 17:51:41 +00:00
NAKAMURA Takumi 8bc9bfaa5a Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Matt Arsenault eaa3a7efab Use llvm_unreachable instead of assert(0)
llvm-svn: 196971
2013-12-10 21:37:42 +00:00
Vincent Lejeune cc0ea74c7b R600: Fix an infinite loop when trying to reorganize export/tex vector input
llvm-svn: 196923
2013-12-10 14:43:31 +00:00
Vincent Lejeune f92d64d160 R600: Fix input modifiers lost for Cayman
llvm-svn: 196922
2013-12-10 14:43:27 +00:00
NAKAMURA Takumi 396d4d3c7e Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
NAKAMURA Takumi e3afe2ef62 Whitespaces.
llvm-svn: 196880
2013-12-10 05:39:12 +00:00
Rafael Espindola e2a1418e68 Don't set a variable to its default value.
llvm-svn: 196807
2013-12-09 19:36:11 +00:00
Vincent Lejeune 92b0a64906 Add a RequireStructuredCFG Field to TargetMachine.
llvm-svn: 196634
2013-12-07 01:49:19 +00:00
Vincent Lejeune ae7e96062c R600: Remove orphaned declarations
llvm-svn: 196633
2013-12-07 01:49:10 +00:00
Eric Christopher 99952a0823 Fix an index array check.
Patch by Marius Wachtler.

llvm-svn: 196561
2013-12-06 02:45:24 +00:00
Rafael Espindola 4cc2b87375 Add a default constructor to get deterministic behavior.
Should fix the msan and valgrind bots.

llvm-svn: 196509
2013-12-05 16:21:17 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Matt Arsenault 89cc49fe5d R600/SI: Add comments for number of used registers.
llvm-svn: 196467
2013-12-05 05:15:35 +00:00
Rafael Espindola 20a8621e5f Don't set PrivateGlobalPrefix for NVPTX and R600.
These targets have special asm printers that don't use these.

llvm-svn: 196187
2013-12-03 01:03:35 +00:00
Rafael Espindola 04867ce9b0 Convert two char* that are only ever used as booleans to bool.
llvm-svn: 196168
2013-12-02 23:04:51 +00:00
Vincent Lejeune 4b8d9e303c R600: Workaround for cayman loop bug
llvm-svn: 196121
2013-12-02 17:29:37 +00:00
Rafael Espindola 50712a456d Change the default of AsmWriterClassName and isMCAsmWriter.
llvm-svn: 196065
2013-12-02 04:55:42 +00:00
NAKAMURA Takumi 226e10edff [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.

Explicit add_dependencies() have been pruned under lib/Target.

llvm-svn: 195929
2013-11-28 17:04:31 +00:00
NAKAMURA Takumi ce746c6c49 [CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.

llvm-svn: 195927
2013-11-28 17:04:04 +00:00
NAKAMURA Takumi b2abd160b3 [CMake] Prune include_directories() in llvm/lib/Target, take #2.
I forgot to commit them. They were staging in my local repo.

llvm-svn: 195924
2013-11-28 15:30:37 +00:00
Rafael Espindola 429e3fb068 The R600 has its own asm printer which doesn't use GlobalPrefix. Drop it.
llvm-svn: 195883
2013-11-27 21:52:37 +00:00
Tom Stellard 175e7a8c97 R600: Expand vector FABS
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195881
2013-11-27 21:23:39 +00:00
Tom Stellard c149dc02d3 R600/SI: Implement spilling of SGPRs v5
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.

v2:
  - Fix encoding of Lane Mask
  - Use correct register flags, so we don't overwrite the low dword
    when restoring multi-dword registers.

v3:
  - Register spilling seems to hang the GPU, so replace all shaders
    that need spilling with a dummy shader.

v4:
  - Fix *LANE definitions
  - Change destination reg class for 32-bit SMRD instructions

v5:
  - Remove small optimization that was crashing Serious Sam 3.

https://bugs.freedesktop.org/show_bug.cgi?id=68224
https://bugs.freedesktop.org/show_bug.cgi?id=71285

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195880
2013-11-27 21:23:35 +00:00
Tom Stellard 859199dad8 R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195879
2013-11-27 21:23:29 +00:00
Tom Stellard 4d566b2edf R600: Add support for ISD::FROUND
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195878
2013-11-27 21:23:20 +00:00
Tom Stellard c0845334da R600/SI: Fixing handling of condition codes
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195514
2013-11-22 23:07:58 +00:00
Tom Stellard cd6b0a658a R600: Implement TargetInstrInfo::isLegalToSplitMBBAt()
Splitting a basic block will create a new ALU clause, so we need to make
sure we aren't moving uses of registers that are local to their
current clause into a new one.

I had a test case for this, but unfortunately unrelated schedule changes
invalidated it, and I wasn't been able to come up with another one.

NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195399
2013-11-22 00:41:08 +00:00
Juergen Ributzka d12ccbd343 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file. The memory leaks in this version have been fixed. Thanks
Alexey for pointing them out.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 195064
2013-11-19 00:57:56 +00:00
Matt Arsenault 3a4d86a1a4 R600/SI: Fix moveToVALU when the first operand is VSrc.
Moving into a VSrc doesn't always work, since it could be
replaced with an SGPR later.

llvm-svn: 195042
2013-11-18 20:09:55 +00:00
Matt Arsenault 08f7e37aa9 R600/SI: Fix multiple SGPR reads when using VCC.
No other SGPR operands are allowed, so if VCC is
used, move the other to a VGPR.

llvm-svn: 195041
2013-11-18 20:09:50 +00:00
Matt Arsenault fb826fa6e1 R600/SI: Implement add i64, but do not yet enable.
Test doesn't actually check the output. I need
to fix add i64 being matched for the addressing
calculations.

llvm-svn: 195040
2013-11-18 20:09:47 +00:00
Matt Arsenault bf6e1e7ff7 R600/SI: Specify SSrc operands
llvm-svn: 195039
2013-11-18 20:09:43 +00:00
Matt Arsenault e8d214662a R600/SI: addc / adde i32 are legal
llvm-svn: 195038
2013-11-18 20:09:40 +00:00
Matt Arsenault 04fca446b1 R600/SI: Match addc to S_ADD_U32.
The carry always goes to SCC.

llvm-svn: 195037
2013-11-18 20:09:37 +00:00
Matt Arsenault f8c089ac25 R600/SI: Match adde/sube to S_ADDC_U32/S_SUBB_U32
llvm-svn: 195036
2013-11-18 20:09:34 +00:00
Matt Arsenault e27a41b5a4 R600/SI: Specify S_ADD/S_SUB set SCC and add is commutable
llvm-svn: 195035
2013-11-18 20:09:32 +00:00
Matt Arsenault 43b8e4ed3b R600/SI: Move patterns to match add / sub to scalar instructions
llvm-svn: 195034
2013-11-18 20:09:29 +00:00
Matt Arsenault f0b1e3a776 R600/SI: Fix extra defs of VCC / SCC.
When replacing scalar operations with vector,
the wrong implicit output register was used.

llvm-svn: 195033
2013-11-18 20:09:21 +00:00
Tom Stellard 66df8a2c0a R600: Enable the IR structurizer by default
llvm-svn: 195031
2013-11-18 19:43:44 +00:00
Tom Stellard 827ec9b630 R600: Fix a crash in the AMDILCFGStrucurizer
The ifPatternMatch() function was not correctly reporting the number
of matches in some cases.

llvm-svn: 195030
2013-11-18 19:43:38 +00:00
Tom Stellard 783893a893 R600: Add a SubtargetFeatture for disabling the ifcvt pass.
This is useful when writing test cases for the AMDIL structurizer.

llvm-svn: 195029
2013-11-18 19:43:33 +00:00
Tom Stellard f1e3f77507 R600: Use lower-case for EnableIRStructurizer feature
llc converts all values passed to -mattr= to lowercase, so this
enables us to toggle this feature when using llc.

llvm-svn: 195028
2013-11-18 19:43:29 +00:00
Tom Stellard f340787d79 R600/SI: Fix illegal VGPR->SGPR copy inside of loop
llvm-svn: 195026
2013-11-18 18:50:20 +00:00
Tom Stellard 13de545693 R600/SI: Fix another case of illegal VGPR->SGPR copy
llvm-svn: 195025
2013-11-18 18:50:15 +00:00
Alexey Samsonov 49109a279c Revert r194865 and r194874.
This change is incorrect. If you delete virtual destructor of both a base class
and a subclass, then the following code:
  Base *foo = new Child();
  delete foo;
will not cause the destructor for members of Child class. As a result, I observe
plently of memory leaks. Notable examples I investigated are:
ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl.

llvm-svn: 194997
2013-11-18 09:31:53 +00:00
Vincent Lejeune 745d4298b1 R600: Make dot_4 instructions predicable
llvm-svn: 194927
2013-11-16 16:24:41 +00:00
Juergen Ributzka dbedae89b9 [weak vtables] Remove a bunch of weak vtables
This patch removes most of the trivial cases of weak vtables by pinning them to
a single object file.

Differential Revision: http://llvm-reviews.chandlerc.com/D2068

Reviewed by Andy

llvm-svn: 194865
2013-11-15 22:34:48 +00:00
Matt Arsenault f14032af0e Make method static
llvm-svn: 194858
2013-11-15 22:02:28 +00:00
Tom Stellard 519ae39c45 R600/SI: Add VReg_96 register class to SIRegisterInfo::hasVGPRs()
This fixes a crash with GNOME settings manager.

llvm-svn: 194836
2013-11-15 18:26:45 +00:00
Matt Arsenault c5559bb14b Add target hook to prevent folding some bitcasted loads.
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)

On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.

Patch by Micah Villmow.

llvm-svn: 194783
2013-11-15 04:42:23 +00:00
Tom Stellard 8f9fc20751 R600: Fix scheduling of instructions that use the LDS output queue
The LDS output queue is accessed via the OQAP register.  The OQAP
register cannot be live across clauses, so if value is written to the
output queue, it must be retrieved before the end of the clause.
With the machine scheduler, we cannot statisfy this constraint, because
it lacks proper alias analysis and it will mark some LDS accesses as
having a chain dependency on vertex fetches.  Since vertex fetches
require a new clauses, the dependency may end up spiltting OQAP uses and
defs so the end up in different clauses.  See the lds-output-queue.ll
test for a more detailed explanation.

To work around this issue, we now combine the LDS read and the OQAP
copy into one instruction and expand it after register allocation.

This patch also adds some checks to the EmitClauseMarker pass, so that
it doesn't end a clause with a value still in the output queue and
removes AR.X and OQAP handling from the scheduler (AR.X uses and defs
were already being expanded post-RA, so the scheduler will never see
them).

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 194755
2013-11-15 00:12:45 +00:00
Tom Stellard 81229a14a3 R600/SI: Add processor type for Hawaii
Patch by: Alex Deucher

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
llvm-svn: 194752
2013-11-14 23:46:00 +00:00
Matt Arsenault 855e0b71d4 R600/SI: Remove redundant legalizeOperands call
llvm-svn: 194749
2013-11-14 23:44:25 +00:00
Hans Wennborg a74fd70ac7 Add #include raw_ostream.h in lib/Target/R600/SIFixSGPRCopies.cpp
This was casuing my release+asserts build on Windows to fail.

llvm-svn: 194747
2013-11-14 23:24:09 +00:00
Matt Arsenault 3383eecd68 R600/SI: Specify S_ADDK/S_MULK set SCC and are commutable
llvm-svn: 194738
2013-11-14 22:32:49 +00:00
Matt Arsenault 671a005e4a Indentation fixes
llvm-svn: 194688
2013-11-14 10:08:50 +00:00
Matt Arsenault f4760455e8 Add a comment
llvm-svn: 194684
2013-11-14 08:06:38 +00:00
Matt Arsenault 269092d747 Fix trailing whitespace in debug printing
llvm-svn: 194683
2013-11-14 08:06:35 +00:00
NAKAMURA Takumi b88288f64b R600/SIFixSGPRCopies.cpp: Fix \param to \return. [-Wdocumentation]
llvm-svn: 194662
2013-11-14 04:05:28 +00:00
NAKAMURA Takumi 78e80cd17d Whitespace.
llvm-svn: 194661
2013-11-14 04:05:22 +00:00
Tom Stellard 415ef6db68 R600: Fix uninitialized variable usage
llvm-svn: 194632
2013-11-13 23:58:51 +00:00
Tom Stellard 81d871dee3 R600/SI: Add support for private address space load/store
Private address space is emulated using the register file with
MOVRELS and MOVRELD instructions.

llvm-svn: 194626
2013-11-13 23:36:50 +00:00
Tom Stellard 8216602a0b R600/SI: Prefer SALU instructions for bit shift operations
All shift operations will be selected as SALU instructions and then
if necessary lowered to VALU instructions in the SIFixSGPRCopies pass.

This allows us to do more operations on the SALU which will improve
performance and is also required for implementing private memory
using indirect addressing, since the private memory pointers must stay
in the scalar registers.

This patch includes some fixes from Matt Arsenault.

llvm-svn: 194625
2013-11-13 23:36:37 +00:00
Rafael Espindola fdc88137f4 Remove AllowQuotesInName and friends from MCAsmInfo.
Accepting quotes is a property of an assembler, not of an object file. For
example, ELF can support any names for sections and symbols, but the gnu
assembler only accepts quotes in some contexts and llvm-mc in a few more.

LLVM should not produce different symbols based on a guess about which assembler
will be reading the code it is printing.

llvm-svn: 194575
2013-11-13 14:01:59 +00:00
Matt Arsenault 00a0d6f672 R600: Fix selection failure on EXTLOAD
llvm-svn: 194547
2013-11-13 02:39:07 +00:00
Vincent Lejeune aee3a10440 R600: Reenable llvm.R600.load.input/interp.input for compatibility
llvm-svn: 194484
2013-11-12 16:26:47 +00:00
Matt Arsenault 72b31eee0b R600/SI: Change formatting of printed registers.
Print the range of registers used with a single letter prefix.
This better matches what the shader compiler produces and
is overall less obnoxious than concatenating all of the
subregister names together.

Instead of SGPR0, it will print s0. Instead of SGPR0_SGPR1,
it will print s[0:1] and so on.

There doesn't appear to be a straightforward way
to get the actual register info in the InstPrinter,
so this parses the generated name to print with the
new syntax.

The required test changes are pretty nasty, and register
matching regexes are now worse. Since there isn't a way to
add to a variable in FileCheck, some of the tests now don't
check the exact number of registers used, but I don't think that
will be a real problem.

llvm-svn: 194443
2013-11-12 02:35:51 +00:00
Vincent Lejeune f143af3fe9 R600: Use function inputs to represent data stored in gpr
llvm-svn: 194425
2013-11-11 22:10:24 +00:00
Matt Arsenault c9ad7c9fcb Make method static
llvm-svn: 194340
2013-11-10 01:04:02 +00:00
Matt Arsenault d82c183d70 Fix missing C++ mode comment
llvm-svn: 194339
2013-11-10 01:03:59 +00:00
Vincent Lejeune 4f3751f2af R600: Fix LowerUDIVREM
llvm-svn: 194153
2013-11-06 17:36:04 +00:00
Matt Arsenault ef1a950b48 Use isa<> instead of dyn_cast<> with unused value
llvm-svn: 193869
2013-11-01 17:39:26 +00:00
Rafael Espindola 4b102d0ead Remove another unused flag.
llvm-svn: 193756
2013-10-31 15:58:33 +00:00
Rafael Espindola 74e1d0a0a0 Remove unused flag.
llvm-svn: 193752
2013-10-31 15:49:39 +00:00
Matt Arsenault 909d0c063f Fix a few typos
llvm-svn: 193723
2013-10-30 23:43:29 +00:00
Tom Stellard c947d8ca64 R600: Custom lower f32 = uint_to_fp i64
llvm-svn: 193701
2013-10-30 17:22:05 +00:00
Aaron Ballman 9ab670fb54 Removing a switch statement that contains only a default label. This resolves an MSVC warning. No functional change intended.
llvm-svn: 193649
2013-10-29 20:40:52 +00:00
Tom Stellard 6e1ee476ab R600/SI: Add compute support for CI v2
v2:
  - Fix LDS size calculation

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 193621
2013-10-29 16:37:28 +00:00
Tom Stellard e118b8becd R600: Expand vector FSQRT ops
llvm-svn: 193620
2013-10-29 16:37:20 +00:00
NAKAMURA Takumi 8a0464393f Prune utf8 chars in comments.
llvm-svn: 193512
2013-10-28 04:07:38 +00:00
NAKAMURA Takumi 4bb85f90fd Target/R600: Un-tab-ify.
llvm-svn: 193510
2013-10-28 04:07:23 +00:00
Tom Stellard 03a5c08de6 R600/SI: Replace ffs(x) - 1 with countTrailingZeros(x)
ffs(x) broke the mingw buildbot.

llvm-svn: 193225
2013-10-23 03:50:25 +00:00
Tom Stellard 54774e5681 R600/SI: fix MIMG writemask adjustement
This fixes piglit:
- shaders/glsl-fs-texture2d-masked
- shaders/glsl-fs-texture2d-masked-4

Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 193222
2013-10-23 02:53:47 +00:00
Tom Stellard af77543244 R600: Fix handling of vector kernel arguments
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted.  In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.

llvm-svn: 193215
2013-10-23 00:44:32 +00:00
Tom Stellard fb9616905a R600/SI: Add support for i64 bitwise or
llvm-svn: 193213
2013-10-23 00:44:19 +00:00
Tom Stellard a66cafa096 R600/SI: Use S_LOAD_DWORD instructions for v8i32 and v16i32
llvm-svn: 193212
2013-10-23 00:44:12 +00:00
Matt Arsenault 65864e3182 R600/SI: Don't assert on SCC usage
llvm-svn: 193198
2013-10-22 21:11:31 +00:00
Tom Stellard debb4cf5ea R600/SI: Use llvm_unreachable() for an always false assert
llvm-svn: 193183
2013-10-22 18:42:03 +00:00
Tom Stellard 8be4dd234a R600/SI: Fix warning on non-asserts build
llvm-svn: 193180
2013-10-22 18:31:45 +00:00
Tom Stellard 26a3b67b3b R600: Simplify handling of private address space
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated.  The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory.  This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.

For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
   MOV instructions that use indirect addressing.

llvm-svn: 193179
2013-10-22 18:19:10 +00:00
Tom Stellard c460b0dcf1 R600: Remove unused InstrInfo::getMovImmInstr() function
llvm-svn: 193178
2013-10-22 18:19:01 +00:00
Benjamin Kramer a9fe95b6c2 R600: Remove \ at EOL from ascii art comments.
Completely harmless, but GCC likes to warn about it even when the next line is
a comment.

llvm-svn: 192974
2013-10-18 14:12:50 +00:00
Tom Stellard b34186ae38 R600: Fix a crash in the AMDILCFGStructurizer
We were calling llvm_unreachable() when failing to optimize the
branch into if case.  However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192813
2013-10-16 17:06:02 +00:00
Tom Stellard 69f86d199a R600: Remove some dead code from the AMDILCFGStructurizer
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192812
2013-10-16 17:05:56 +00:00
Matt Arsenault 226580656b Fix typo
llvm-svn: 192752
2013-10-15 23:44:48 +00:00
Matt Arsenault df90c02e68 Fix missing C++ mode thing in header
llvm-svn: 192751
2013-10-15 23:44:45 +00:00
Vincent Lejeune 5d6c2c318b R600/SI: Remove some leftover MI dump call
llvm-svn: 192743
2013-10-15 22:48:51 +00:00
Vincent Lejeune d6cbede9c5 R600: improve dump of S_WAITCNT
llvm-svn: 192557
2013-10-13 17:56:28 +00:00
Vincent Lejeune 4ee6dd6136 R600/SI: Add SinkingPass before ISel
llvm-svn: 192556
2013-10-13 17:56:21 +00:00
Vincent Lejeune d623644d17 R600/SI: Support byval arguments
llvm-svn: 192555
2013-10-13 17:56:16 +00:00
Vincent Lejeune fa58a5fb60 R600: Use masked read sel for texture instructions
llvm-svn: 192554
2013-10-13 17:56:10 +00:00
Vincent Lejeune 301beb80d4 R600: fix swizzle export
llvm-svn: 192553
2013-10-13 17:56:04 +00:00
Vincent Lejeune 533352f696 R600: Clear the VPM bit of export instructions.
It makes apparently no change it to set this bit or not but the
docs recommand to left it cleared.

llvm-svn: 192552
2013-10-13 17:55:57 +00:00
Tom Stellard ed69925998 R600: Store disassembly in a special ELF section when feature +DumpCode is enabled.
Patch by: Jay Cornwall

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 192523
2013-10-12 05:02:51 +00:00
Matt Arsenault 8fb373891f Fix typo
llvm-svn: 192499
2013-10-11 21:03:36 +00:00
Matt Arsenault 1408b60291 Fix typo
llvm-svn: 192406
2013-10-10 23:05:37 +00:00
Matt Arsenault 204cfa6e43 R600: Fix trunc i64 to i32 on SI
llvm-svn: 192375
2013-10-10 18:04:16 +00:00
Tom Stellard 93fabcebf1 R600/SI: Implement SIInstrInfo::verifyInstruction() for VOP*
The function is used by the machine verifier and checks that VOP*
instructions have legal operands.

llvm-svn: 192367
2013-10-10 17:11:55 +00:00
Tom Stellard 682bfbc43d R600/SI: Define a separate MIMG instruction for each possible output value type
During instruction selection, we rewrite the destination register
class for MIMG instructions based on their writemasks.  This creates
machine verifier errors since the new register class does not match
the register class in the MIMG instruction definition.

We can avoid this by defining different MIMG instructions for each
possible destination type and then switching to the correct instruction
when we change the register class.

llvm-svn: 192365
2013-10-10 17:11:24 +00:00
Tom Stellard 1b99ed8290 R600/SI: Mark the EXEC register as reserved
This prevents the machine verifier from complaining about uses of
an undefined physical register.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192364
2013-10-10 17:11:19 +00:00
Tom Stellard ed0ceec1c1 R600: Use StructurizeCFGPass for non SI targets
StructurizeCFG pass allows to make complex cfg reducible ; it allows a lot of
shader from shadertoy (which exhibits complex control flow constructs) to works
correctly with respect to CFG handling (and allow us to detect potential bug in
other part of the backend).

We provide a cmd line argument to disable the pass for debug purpose.

Patch by: Vincent Lejeune

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 192363
2013-10-10 17:11:12 +00:00
Rafael Espindola a17151ad5a Add a MCTargetStreamer interface.
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.

The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.

I will send an email to llvmdev with instructions on how to use this.

llvm-svn: 192181
2013-10-08 13:08:17 +00:00
Vincent Lejeune 6df39438af R600: Add a ldptr intrinsic to support MSAA.
llvm-svn: 191838
2013-10-02 16:00:33 +00:00
Vincent Lejeune a4da6fb535 R600: add a pass that merges clauses.
llvm-svn: 191790
2013-10-01 19:32:58 +00:00
Vincent Lejeune 0b342d6f74 R600: Put PRED_X instruction in its own clause
llvm-svn: 191789
2013-10-01 19:32:49 +00:00
Vincent Lejeune 269708b98d R600: Enable -verify-machineinstrs in some tests.
llvm-svn: 191788
2013-10-01 19:32:38 +00:00
Arnold Schwaighofer d2f96b91ca IfConverter: Use TargetSchedule for instruction latencies
For targets that have instruction itineraries this means no change. Targets
that move over to the new schedule model will use be able the new schedule
module for instruction latencies in the if-converter (the logic is such that if
there is no itineary we will use the new sched model for the latencies).

Before, we queried "TTI->getInstructionLatency()" for the instruction latency
and the extra prediction cost. Now, we query the TargetSchedule abstraction for
the instruction latency and TargetInstrInfo for the extra predictation cost. The
TargetSchedule abstraction will internally call "TTI->getInstructionLatency" if
an itinerary exists, otherwise it will use the new schedule model.

ATTENTION: Out of tree targets!

(I will also send out an email later to LLVMDev)

This means, if your target implements

 unsigned getInstrLatency(const InstrItineraryData *ItinData,
                          const MachineInstr *MI,
                          unsigned *PredCost);

and returns a value for "PredCost", you now also need to implement

 unsigned getPredictationCost(const MachineInstr *MI);

(if your target uses the IfConversion.cpp pass)

radar://15077010

llvm-svn: 191671
2013-09-30 15:28:56 +00:00
Robert Wilhelm 2788d3ec99 Even more spelling fixes for "instruction".
llvm-svn: 191611
2013-09-28 13:42:22 +00:00
Tom Stellard 0351ea2010 R600: Fix handling of NAN in comparison instructions
We were completely ignoring the unorder/ordered attributes of condition
codes and also incorrectly lowering seto and setuo.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 191603
2013-09-28 02:50:50 +00:00
Tom Stellard 5694d3090a SelectionDAG: Improve legalization of SELECT_CC with illegal condition codes
SelectionDAG will now attempt to inverse an illegal conditon in order to
find a legal one and if that doesn't work, it will attempt to swap the
operands using the inverted condition.

There are no new test cases for this, but a nubmer of the existing R600
tests hit this path.

llvm-svn: 191602
2013-09-28 02:50:43 +00:00
Tom Stellard cd42818d86 SelectionDAG: Try to expand all condition codes using getCCSwappedOperands()
This is useful for targets like R600, which only support GT, GE, NE, and EQ
condition codes as it removes the need to handle unsupported condition
codes in target specific code.

There are no tests with this commit, but R600 has been updated to take
advantage of this new feature, so its existing selectcc tests are now
testing the swapped operands path.

llvm-svn: 191601
2013-09-28 02:50:38 +00:00
David Majnemer 1ccd2f2aee MC: Remove vestigial PCSymbol field from AsmInfo
llvm-svn: 191362
2013-09-25 09:36:11 +00:00
Tim Northover 31d093c705 ISelDAG: spot chain cycles involving MachineNodes
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.

Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.

This should fix PR15840.

llvm-svn: 191165
2013-09-22 08:21:56 +00:00
Andrew Trick 978674b2bc Allow subtarget selection of the default MachineScheduler and document the interface.
The global registry is used to allow command line override of the
scheduler selection, but does not work well as the normal selection
API. For example, the same LLVM process should be able to target
multiple targets or subtargets.

llvm-svn: 191071
2013-09-20 05:14:41 +00:00
Vincent Lejeune 0167a313da R600: Move clamp handling code to R600IselLowering.cpp
llvm-svn: 190645
2013-09-12 23:45:00 +00:00
Vincent Lejeune 9a248e5c2d R600: Move code handling literal folding into R600ISelLowering.
llvm-svn: 190644
2013-09-12 23:44:53 +00:00
Vincent Lejeune ab3baf80a8 R600: Move fabs/fneg/sel folding logic into PostProcessIsel
This move makes possible to correctly handle multiples instructions
from a single pattern.

llvm-svn: 190643
2013-09-12 23:44:44 +00:00
Tom Stellard afcf12f33a R600/SI: expose TBUFFER_STORE_FORMAT_* for OpenGL transform feedback
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.

The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.

The maximum number of input SGPRs is bumped to 17.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
2013-09-12 02:55:14 +00:00
Tom Stellard 7f6fa4c4c5 R600: Don't use trans slot for instructions that read LDS source registers
This fixes some regressions in the piglit local memory store tests
introduced by recent commits which made the scheduler aware of the trans
slot.

It's not possible to test this using lit, because there is no way to
determine from the assembly dumps whether or not an instruction is in
the trans slot.

Even if this were possible, the test would be highly sensitive to
changes in the scheduler and might generate confusing false negatives.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 190574
2013-09-12 02:55:06 +00:00
Bill Wendling 58e2d3d856 Generate compact unwind encoding from CFI directives.
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.

Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.

<rdar://problem/13623355>

llvm-svn: 190290
2013-09-09 02:37:14 +00:00
Aaron Watry 372cecf642 R600: Add support for LDS atomic subtract
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190200
2013-09-06 20:17:42 +00:00
Tom Stellard 8bc633ac09 R600: Coding style
llvm-svn: 190110
2013-09-05 23:55:13 +00:00
Matt Arsenault 6f24379974 R600: Fix i64 to i32 trunc on SI
llvm-svn: 190091
2013-09-05 19:41:10 +00:00
Tom Stellard 13c68ef88b R600: Add support for local memory atomic add
llvm-svn: 190080
2013-09-05 18:38:09 +00:00
Tom Stellard 53f2f90eb4 R600: Expand SELECT nodes rather than custom lowering them
llvm-svn: 190079
2013-09-05 18:38:03 +00:00
Tom Stellard de60e25278 R600: Fix incorrect LDS size calculation
GlobalAdderss nodes that appeared in more than one basic block were
being counted twice.

llvm-svn: 190078
2013-09-05 18:37:57 +00:00
Tom Stellard d50bb3c8d4 R600/SI: Don't emit S_WQM_B64 instruction for compute shaders
llvm-svn: 190077
2013-09-05 18:37:52 +00:00
Tom Stellard 624741fded R600: Fix segfault in R600TextureIntrinsicReplacer
This pass was segfaulting when it ran into a non-intrinsic function
call.  Function calls are not supported, so now instead of segfaulting,
we will get an assertion failure with a nice error message.

I'm not sure how to test this using lit.

llvm-svn: 190076
2013-09-05 18:37:45 +00:00
Vincent Lejeune 744efa4dca R600: Use shared op optimization when checking cycle compatibility
llvm-svn: 189981
2013-09-04 19:53:54 +00:00
Vincent Lejeune 7e2c83256b R600: Non vector only instruction can be scheduled on trans unit
llvm-svn: 189980
2013-09-04 19:53:46 +00:00
Vincent Lejeune 4d5c5e53d0 R600: Use SchedModel enum for is{Trans,Vector}Only functions
llvm-svn: 189979
2013-09-04 19:53:30 +00:00
Michael Gottesman c9f5859f81 Add llvm namespace to llvm::next.
llvm-svn: 189912
2013-09-04 04:26:09 +00:00
Michael Gottesman 114ac1a230 Use llvm::next() instead of incrementing begin iterators of std::vector.
Iterator of std::vector may be implemented as a raw pointer. In
this case begin iterators are rvalues and cannot be incremented.
For example, this is the case with STDCXX implementation of vector.

Patch by Konstantin Tokarev <annulen@yandex.ru>.

llvm-svn: 189911
2013-09-04 04:19:01 +00:00
Benjamin Kramer bda73fff49 Mark an unreachable code path with llvm_unreachable. Pacifies GCC.
llvm-svn: 189726
2013-08-31 21:20:04 +00:00
Tom Stellard 35bb18c2a7 R600: Add support for vector local memory loads
llvm-svn: 189226
2013-08-26 15:06:04 +00:00
Tom Stellard c6f4a29ed5 R600: Add support for i8 and i16 local memory loads
llvm-svn: 189225
2013-08-26 15:05:59 +00:00
Tom Stellard f3d166aa1e R600: Add support for i8 and i16 local memory stores
llvm-svn: 189223
2013-08-26 15:05:49 +00:00
Tom Stellard 2ffc330673 R600: Add support for v4i32 and v2i32 local stores
llvm-svn: 189222
2013-08-26 15:05:44 +00:00
Tom Stellard fd155828ed SelectionDAG: Use correct pointer size when lowering function arguments v2
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes.  The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.

This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.

v2:
  - Add more helper functions to TargetLoweringBase
  - Use CHECK-LABEL for tests

llvm-svn: 189221
2013-08-26 15:05:36 +00:00
Tom Stellard 15e4811455 R600/SI: Fix another case of illegal VGPR to SGPR copy
This fixes a crash in Unigine Tropics.

https://bugs.freedesktop.org/show_bug.cgi?id=68389

llvm-svn: 189057
2013-08-22 20:21:02 +00:00
Tom Stellard f6d8023ca4 R600: Remove unnecessary casts
Spotted by Bill Wendling.

llvm-svn: 188942
2013-08-21 22:14:17 +00:00
Dmitri Gribenko 8b2a3d1fea Remove unused stdio.h includes
llvm-svn: 188626
2013-08-18 08:29:51 +00:00
Tom Stellard 59ed08b238 R600: Fix possible use of an uninitialized variable
Spotted by Nick Lewycky!

llvm-svn: 188599
2013-08-17 00:06:51 +00:00
Tom Stellard b249b75726 R600: Expand vector FRINT ops
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188598
2013-08-16 23:51:33 +00:00
Tom Stellard ad3aff246c R600: Expand vector FFLOOR ops
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188597
2013-08-16 23:51:29 +00:00
Tom Stellard a92ff87929 R600: Expand vector float operations for both SI and R600
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 188596
2013-08-16 23:51:24 +00:00
Michel Danzer 8522270d7e R600/SI: Add pattern for xor of i1
Fixes two recent piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188559
2013-08-16 16:19:31 +00:00
Michel Danzer 20680b1cc5 R600/SI: Fix broken encoding of DS_WRITE_B32
The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.

Undo that damage and only apply the SMRD logic to that.

Fixes some derivates related piglit regressions with radeonsi.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558
2013-08-16 16:19:24 +00:00
Benjamin Kramer a8eecee121 R600: Allocate memoperand in the MachienFunction so it doesn't leak.
llvm-svn: 188555
2013-08-16 14:48:09 +00:00
Tom Stellard dba25713a6 Revert "R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions"
This reverts commit a6a39ced095c2f453624ce62c4aead25db41a18f.
This is the wrong version of this fix.

llvm-svn: 188523
2013-08-16 01:18:43 +00:00
Tom Stellard 82bef57f20 R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write).  This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.

llvm-svn: 188522
2013-08-16 01:12:20 +00:00
Tom Stellard b03edeca67 R600: Add support for global vector loads with element types less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188521
2013-08-16 01:12:16 +00:00
Tom Stellard fbab827e2a R600: Add support for global vector stores with elements less than 32-bits
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188520
2013-08-16 01:12:11 +00:00
Tom Stellard d3ee8c103a R600: Add support for i16 and i8 global stores
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188519
2013-08-16 01:12:06 +00:00
Tom Stellard 6d1379e180 R600: Add support for v4i32 stores on Cayman
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188518
2013-08-16 01:12:00 +00:00
Tom Stellard 16da74c205 R600: Enable folding of inline literals into REQ_SEQUENCE instructions
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188517
2013-08-16 01:11:55 +00:00
Tom Stellard 676c16d088 R600: Add IsExport bit to TableGen instruction definitions
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188516
2013-08-16 01:11:51 +00:00
Tom Stellard ac00f9df79 R600: Change the RAT instruction assembly names so they match the docs
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 188515
2013-08-16 01:11:46 +00:00
Matt Arsenault 5cae894a13 Fix spelling
llvm-svn: 188506
2013-08-15 23:11:03 +00:00
Alexey Samsonov 3186eb3efd Tentative fix for global-buffer-overflow caused by r188426. Found by AddressSanitizer
llvm-svn: 188448
2013-08-15 07:11:34 +00:00
Tom Stellard d86003e31f R600/SI: Improve legalization of vector operations
This should fix hangs in the OpenCL piglit tests.

llvm-svn: 188431
2013-08-14 23:25:00 +00:00
Tom Stellard 6785065ace R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsics
llvm-svn: 188430
2013-08-14 23:24:53 +00:00
Tom Stellard 9fa1791a1b R600/SI: Convert v16i8 resource descriptors to i128
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.

This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:

https://bugs.freedesktop.org/show_bug.cgi?id=66805

llvm-svn: 188429
2013-08-14 23:24:45 +00:00
Tom Stellard 8e5da41374 R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.

v2:
  - Use an SGPR register class if all the operands of BUILD_VECTOR are
    SGPRs.

llvm-svn: 188427
2013-08-14 23:24:32 +00:00
Tom Stellard df94dc3917 R600/SI: Choose the correct MOV instruction for copying immediates
The instruction selector will now try to infer the destination register
so it can decided whether to use V_MOV_B32 or S_MOV_B32 when copying
immediates.

llvm-svn: 188426
2013-08-14 23:24:24 +00:00
Tom Stellard 16a9a205c8 R600/SI: Assign a register class to the $vaddr operand for MIMG instructions
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.

llvm-svn: 188425
2013-08-14 23:24:17 +00:00
Tom Stellard 3494b7ee42 R600/SI: Handle MSAA texture targets
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188421
2013-08-14 22:22:14 +00:00
Tom Stellard 20ee94f152 R600/SI: Allow conversion between v32i8 and v8i32
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188420
2013-08-14 22:22:09 +00:00
Tom Stellard a36f077159 R600/SI: Fix an obvious typo
Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188419
2013-08-14 22:22:03 +00:00
Tom Stellard 73c31d541e R600/SI: Add pattern for fp_to_uint
This fixes the F2U opcode for the Mesa driver.

Patch by: Marek Olšák

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 188418
2013-08-14 22:21:57 +00:00
Tom Stellard fc455471c3 R600: Set scheduling preference to Sched::Source
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler.  Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:

With Sched::VLIW

Total Compile Time:                  1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)

With Sched::Source

Total Compile Time:                  0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)

The code ouput was identical with both schedulers.  This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.

llvm-svn: 188215
2013-08-12 22:33:21 +00:00
Niels Ole Salscheider d3a039fed2 R600/SI: FMA is faster than fmul and fadd for f64
llvm-svn: 188136
2013-08-10 10:38:54 +00:00
Niels Ole Salscheider 6509ac65a9 R600/SI: Add FMA pattern
llvm-svn: 188135
2013-08-10 10:38:47 +00:00
Niels Ole Salscheider 719fbc9ae7 R600/SI: Implement fp32<->fp64 conversions
llvm-svn: 187988
2013-08-08 16:06:15 +00:00
Niels Ole Salscheider 4715d886f8 R600/SI: Implement sint<->fp64 conversions
llvm-svn: 187987
2013-08-08 16:06:08 +00:00
Evgeniy Stepanov bc8808ce4a Initialize SIInsertWaits::ExpInstrTypesSeen in the pass constructor.
This value may be used uninitialized in SIInsertWaits::insertWait.
Found with MemorySanitizer.

llvm-svn: 187869
2013-08-07 07:47:41 +00:00
Tom Stellard f5a988b35f R600: Add new file from r187831 to CMakeLists.txt
llvm-svn: 187834
2013-08-06 23:12:34 +00:00
Tom Stellard 2f7cdda57e R600/SI: Use VSrc_* register classes as the default classes for types
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:

SGPR = COPY VGPR

Will now be emitted like this:

VSrC = COPY VGPR

This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur.  Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.

llvm-svn: 187831
2013-08-06 23:08:28 +00:00
Tom Stellard 4c0ffccbbf R600/SI: Add more special cases for opcodes to ensureSRegLimit()
Also factor out the register class lookup to its own function.

llvm-svn: 187830
2013-08-06 23:08:18 +00:00
NAKAMURA Takumi aaf66c7357 Target/*/CMakeLists.txt: Add the dependency to CommonTableGen explicitly for each corresponding CodeGen.
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.

llvm-svn: 187780
2013-08-06 06:38:37 +00:00
Tom Stellard aa664d9b92 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

llvm-svn: 187764
2013-08-06 02:43:45 +00:00
Tom Stellard 28d06de6f6 R600: Implement TargetLowering::getVectorIdxTy()
We use MVT::i32 for the vector index type, because we use 32-bit
operations to caculate offsets when dynamically indexing vectors.

llvm-svn: 187749
2013-08-05 22:22:07 +00:00
Tom Stellard 0344cdfe39 R600: Add 64-bit float load/store support
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions

Tom Stellard:
  - Mark vec2 operations as expand.  The addition of a vec2 register
    class made them all legal.

Patch by: Dmitry Cherkassov

Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
2013-08-01 15:23:42 +00:00
Tom Stellard 53698938a4 R600: Use 64-bit alignment for 64-bit kernel arguments
llvm-svn: 187581
2013-08-01 15:23:31 +00:00
Tom Stellard 98f675a994 R600/SI: Custom lower i64 ZERO_EXTEND
llvm-svn: 187580
2013-08-01 15:23:26 +00:00
Tom Stellard ca69a53bae Revert "R600: Non vector only instruction can be scheduled on trans unit"
This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6.

llvm-svn: 187526
2013-07-31 20:43:27 +00:00
Tom Stellard 4dd41845ec Revert "R600: Use SchedModel enum for is{Trans,Vector}Only functions"
This reverts commit 3f1de26cb5cc0543a6a1d71259a7a39d97139051.

llvm-svn: 187524
2013-07-31 20:43:03 +00:00
Vincent Lejeune 220db748b0 R600: Do not mergevector after a vector reg is used
If we merge vector when a vector is used, it will generate an artificial
antidependency that can prevent 2 tex/vtx instructions to use the same
clause and thus generate extra clauses that reduce performance.

There is no test case as such situation is really hard to predict.

llvm-svn: 187516
2013-07-31 19:32:12 +00:00
Vincent Lejeune bb3f931123 R600: Avoid more than 4 literals in the same instruction group at scheduling
llvm-svn: 187515
2013-07-31 19:32:07 +00:00
Vincent Lejeune df18804e26 R600: Non vector only instruction can be scheduled on trans unit
llvm-svn: 187514
2013-07-31 19:31:56 +00:00
Vincent Lejeune 21de8baa15 R600: Don't mix LDS and non-LDS instructions in the same group
There are a lot of restrictions on instruction groups that contain
LDS instructions, so for now we will be conservative and not packetize
anything else with them.

llvm-svn: 187513
2013-07-31 19:31:41 +00:00
Vincent Lejeune 79afe17e99 R600: Use SchedModel enum for is{Trans,Vector}Only functions
llvm-svn: 187512
2013-07-31 19:31:35 +00:00
Vincent Lejeune 0c5ed2b437 R600: Remove predicated_break inst
We were using two instructions for similar purpose : break and
predicated break. Only predicated_break was emitted and it was
lowered at R600ControlFlowFinalizer to JUMP;CF_BREAK;POP.
This commit simplify the situation by making AMDILCFGStructurizer
emit IF_PREDICATE;BREAK;ENDIF; instead of predicated_break (which
is now removed).

There is no functionality change.

llvm-svn: 187510
2013-07-31 19:31:14 +00:00
Tom Stellard aa313d0a74 R600/SI: Expand vector fp <-> int conversions
llvm-svn: 187421
2013-07-30 14:31:03 +00:00
Quentin Colombet e2e0548d77 [R600] Replicate old DAGCombiner behavior in target specific DAG combine.
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.

llvm-svn: 187397
2013-07-30 00:27:16 +00:00
Tom Stellard 8b1e021e85 SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches.  The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.

Patch by: Mei Ye

llvm-svn: 187278
2013-07-27 00:01:07 +00:00
Tom Stellard c54731aa9d DAGCombiner: Pass the correct type to TargetLowering::isF(Abs|Neg)Free
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.

llvm-svn: 187007
2013-07-23 23:55:03 +00:00
Tom Stellard 8cb0e47c9e R600: Treat CONSTANT_ADDRESS loads like GLOBAL_ADDRESS loads when necessary
These are really the same address space in hardware.  The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access.  When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.

llvm-svn: 187006
2013-07-23 23:54:56 +00:00
Tom Stellard 5263948a7b R600: Add support for 24-bit MAD instructions
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186923
2013-07-23 01:48:49 +00:00
Tom Stellard 41fc7853be R600: Add support for 24-bit MUL instructions
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186922
2013-07-23 01:48:42 +00:00
Tom Stellard 9f95033d33 R600: Improve support for < 32-bit loads
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186921
2013-07-23 01:48:35 +00:00
Tom Stellard ba30932908 R600: Rename AMDILISelDAGToDAG.cpp -> AMDGPUISelDAGToDAG.cpp
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186920
2013-07-23 01:48:29 +00:00
Tom Stellard 840214437b R600: Move CONST_ADDRESS folding into AMDGPUDAGToDAGISel::Select()
This increases the number of opportunites we have for folding.  With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
2013-07-23 01:48:24 +00:00
Tom Stellard 1e80309ebe R600: Use KCache for kernel arguments
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186918
2013-07-23 01:48:18 +00:00
Tom Stellard 34ed721af4 R600: Simplify assembly for KCache registers using the TableGen !add operator
Before:

MOV * T0.W, KC0[131-128].Y

After:

MOV * T0.W, KC0[3].Y

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186917
2013-07-23 01:48:08 +00:00
Tom Stellard acfeebf883 R600: Use the same compute kernel calling convention for all GPUs
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.

Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
2013-07-23 01:48:05 +00:00
Tom Stellard 78e012969c R600: Use correct LoadExtType when lowering kernel arguments
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186915
2013-07-23 01:47:58 +00:00
Tom Stellard 33dd04bfbe R600: Clean up extended load patterns
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186914
2013-07-23 01:47:52 +00:00
Tom Stellard beed74af48 R600: Expand vector FNEG
llvm-svn: 186913
2013-07-23 01:47:46 +00:00
Vincent Lejeune 8b8a7b5514 R600: Don't emit empty then clause and use alu_pop_after
llvm-svn: 186725
2013-07-19 21:45:15 +00:00
Vincent Lejeune 960a622ca6 R600: Simplify AMDILCFGStructurize by removing templates and assuming single exit
llvm-svn: 186724
2013-07-19 21:45:06 +00:00
Vincent Lejeune a8c38fedd6 R600: Replace legacy debug code in AMDILCFGStructurizer.cpp
llvm-svn: 186723
2013-07-19 21:44:56 +00:00
Tom Stellard 8374720aad R600/SI: Fix crash with VSELECT
https://bugs.freedesktop.org/show_bug.cgi?id=66175

llvm-svn: 186616
2013-07-18 21:43:53 +00:00
Tom Stellard adf732cfbc R600/SI: Add support for v2f32 loads
llvm-svn: 186615
2013-07-18 21:43:48 +00:00
Tom Stellard ed2f6149f3 R600/SI: Add support for v2f32 stores
llvm-svn: 186614
2013-07-18 21:43:42 +00:00
Tom Stellard 67ae4762ef R600: Expand VSELECT for all types
llvm-svn: 186613
2013-07-18 21:43:35 +00:00
Craig Topper 8fc4096fab Move string pointer from being a static class member to just a static global in the one file its needed in.
llvm-svn: 186476
2013-07-17 00:31:35 +00:00
Craig Topper d3a34f81f8 Add 'const' qualifiers to static const char* variables.
llvm-svn: 186371
2013-07-16 01:17:10 +00:00
Tom Stellard 31209cc8eb R600/SI: Add support for 64-bit loads
https://bugs.freedesktop.org/show_bug.cgi?id=65873

llvm-svn: 186339
2013-07-15 19:00:09 +00:00
Craig Topper 0afd0ab749 Make some arrays 'static const'
llvm-svn: 186307
2013-07-15 06:39:13 +00:00
Craig Topper 5871321e49 Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).
llvm-svn: 186301
2013-07-15 04:27:47 +00:00
Craig Topper b94011fd28 Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Benjamin Kramer c22c790f89 R600: Remove unsafe type punning. No intended functionality change.
llvm-svn: 186196
2013-07-12 20:18:05 +00:00
Tom Stellard ccae60acc3 R600/SI: Add support for f64 kernel arguments
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186182
2013-07-12 18:15:26 +00:00
Tom Stellard 4e1100ab75 R600/SI: Implement select and compares for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186181
2013-07-12 18:15:19 +00:00
Tom Stellard 8ed7b45da3 R600/SI: Add fsqrt pattern for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186180
2013-07-12 18:15:13 +00:00
Tom Stellard 2a6a610516 R600/SI: Add double precision fsub pattern for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186179
2013-07-12 18:15:08 +00:00
Tom Stellard ab8a8c84d4 R600/SI: SI support for 64bit ConstantFP
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186178
2013-07-12 18:15:02 +00:00
Tom Stellard 7512c0803c R600/SI: Add initial double precision support for SI
Patch by: Niels Ole Salscheider

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186177
2013-07-12 18:14:56 +00:00
Aaron Ballman f04bbd8b7f Replacing an empty switch with its moral equivalent. No functional changes intended.
llvm-svn: 186017
2013-07-10 17:19:22 +00:00
Michel Danzer 49812b5bbd R600/SI: Initial local memory support
Enough for the radeonsi driver to use it for calculating derivatives.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186012
2013-07-10 16:37:07 +00:00
Michel Danzer 1f87df365f R600/SI: Add pattern for the AMDGPU.barrier.local intrinsic
lit test coverage to follow in the next commit.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186011
2013-07-10 16:36:57 +00:00
Michel Danzer 8d69617b27 R600/SI: Add intrinsic for retrieving the current thread ID
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186010
2013-07-10 16:36:52 +00:00
Michel Danzer 1c45430e76 R600/SI: Initial support for LDS/GDS instructions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186009
2013-07-10 16:36:43 +00:00
Michel Danzer 83f87c4c2e R600/SI: Add intrinsics for texture sampling with user derivatives
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 186008
2013-07-10 16:36:36 +00:00
Vincent Lejeune ce499744b3 R600: Do not predicated basic block with multiple alu clause
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.

NOTE: This is a candidate for the stable branch.
llvm-svn: 185943
2013-07-09 15:03:33 +00:00
Vincent Lejeune b8aac8d720 R600: Fix a rare bug where swizzle optimization returns wrong values
llvm-svn: 185942
2013-07-09 15:03:25 +00:00
Vincent Lejeune a4d8d2ef2b R600: Fix wrong export reswizzling
llvm-svn: 185941
2013-07-09 15:03:19 +00:00
Vincent Lejeune b55940cc7d R600: Use DAG lowering pass to handle fcos/fsin
NOTE: This is a candidate for the stable branch.
llvm-svn: 185940
2013-07-09 15:03:11 +00:00
Vincent Lejeune f10d1cd2a3 R600: Print Export Swizzle
llvm-svn: 185939
2013-07-09 15:03:03 +00:00
Craig Topper 31ee5866de Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185540
2013-07-03 15:07:05 +00:00
Rafael Espindola 64e1af8eb9 Remove address spaces from MC.
This is dead code since PIC16 was removed in 2010. The result was an odd mix,
where some parts would carefully pass it along and others would assert it was
zero (most of the object streamer for example).

llvm-svn: 185436
2013-07-02 15:49:13 +00:00
Chad Rosier 797ee3e3c6 Add a newline.
llvm-svn: 185385
2013-07-01 21:31:10 +00:00
Vincent Lejeune a8a50248d8 R600: Fix an unitialized variable in R600InstrInfo.cpp
llvm-svn: 185294
2013-06-30 21:44:06 +00:00
Benjamin Kramer 396906456f R600: Unbreak GCC build.
operator++ on an enum is not legal. clang happens to accept it anyways, I think
that's a known bug.

llvm-svn: 185269
2013-06-29 20:04:19 +00:00
Vincent Lejeune 77a8352476 R600: Support schedule and packetization of trans-only inst
llvm-svn: 185268
2013-06-29 19:32:43 +00:00
Vincent Lejeune bb8a872158 R600: Bank Swizzle now display SCL equivalent
llvm-svn: 185267
2013-06-29 19:32:29 +00:00
Tom Stellard c46e56721e R600/SI: Add processor types for each CIK variant
Patch By: Alex Deucher

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
llvm-svn: 185209
2013-06-28 20:23:29 +00:00
Tom Stellard c026e8bc8e R600: Add local memory support via LDS
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 185162
2013-06-28 15:47:08 +00:00
Tom Stellard ce540330df R600: Add support for GROUP_BARRIER instruction
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 185161
2013-06-28 15:46:59 +00:00
Tom Stellard 5eb903d9c5 R600: Add ALUInst bit to tablegen definitions v2
v2:
  - Remove functions left over from a previous rebase.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 185160
2013-06-28 15:46:53 +00:00
Tom Stellard 02661d9605 R600: Use new getNamedOperandIdx function generated by TableGen
llvm-svn: 184880
2013-06-25 21:22:18 +00:00
Aaron Watry 0a794a4612 R600: Consolidate expansion of v2i32/v4i32 ops for EG/SI
By default, we expand these operations for both EG and SI. Move the
duplicated code into a common space for now. If the targets ever actually
implement these operations as instructions, we can override that in the relevant
target.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184848
2013-06-25 13:55:57 +00:00
Aaron Watry daabb20e1b R600/SI: Expand xor v2i32/v4i32
Add test cases for both vector sizes on SI and also add v2i32 test for EG.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184846
2013-06-25 13:55:52 +00:00
Aaron Watry 83fa6006bc R600/SI: Expand urem of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Note: I followed the guidance of the v4i32 EG check... UREM produces really
complex code, so let's just check that the instruction was lowered
successfully.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184844
2013-06-25 13:55:46 +00:00
Aaron Watry 5527b6c6b6 R600/SI: Expand udiv v[24]i32 for SI and v2i32 for EG
Also add lit test for both cases on SI, and v2i32 for evergreen.

Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184843
2013-06-25 13:55:43 +00:00
Aaron Watry 16d80c0529 R600/SI: Expand ashr of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184842
2013-06-25 13:55:40 +00:00
Aaron Watry f63791e778 R600/SI: Expand srl of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184841
2013-06-25 13:55:37 +00:00
Aaron Watry 5584553984 R600/SI: Expand shl of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184840
2013-06-25 13:55:32 +00:00
Aaron Watry 2fa162e88e R600/SI: Expand or of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184839
2013-06-25 13:55:29 +00:00
Aaron Watry 265eef5efe R600/SI: Expand mul of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184838
2013-06-25 13:55:26 +00:00
Aaron Watry 00aeb119db R600/SI: Expand and of v2i32/v4i32 for SI
Also add lit test for both cases on SI, and v2i32 for evergreen.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184837
2013-06-25 13:55:23 +00:00
Tom Stellard 0125f2a6e4 R600/SI: Report unaligned memory accesses as legal for > 32-bit types
In reality, some unaligned memory accesses are legal for 32-bit types and
smaller too, but it all depends on the address space.  Allowing
unaligned loads/stores for > 32-bit types is mainly to prevent the
legalizer from splitting one load into multiple loads of smaller types.

https://bugs.freedesktop.org/show_bug.cgi?id=65873

llvm-svn: 184822
2013-06-25 02:39:35 +00:00
Tom Stellard 9810ec613c R600: Add support for i32 loads from the constant address space on Cayman
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 184821
2013-06-25 02:39:30 +00:00
Tom Stellard b06f3fc1be R600/SI: Add support for v4i32 and v4f32 kernel args
Tested-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 184820
2013-06-25 02:39:25 +00:00
Tom Stellard 9d2e1500b4 R600: Fix typo in R600Schedule.td
This should only make a difference in programs that use a lot of the
vector ALU instructions like BFI_INT and BIT_ALIGN.  There is a slight
improvement in the phatk bitcoin mining kernel with this patch on
Evergreen (vector size == 1):

Before:
1173 Instruction Groups / 9520 dwords

After:
1167 Instruction Groups / 9510 dwords

Reviewed-by: Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184819
2013-06-25 02:39:20 +00:00
Aaron Watry 52a72c926c R600: Fix spelling error in comment
our -> or

llvm-svn: 184756
2013-06-24 16:57:57 +00:00
Tom Stellard 96d38760fc R600/SI: Expand sub for v2i32 and v4i32 for SI
Also add a v2i32 test to the existing v4i32 test.

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184482
2013-06-20 21:55:37 +00:00
Tom Stellard 043795e818 R600/SI: Expand add for v2i32 and v4i32
Also add SI tests to existing file and a v2i32 test for both
R600 and SI.

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 184481
2013-06-20 21:55:30 +00:00
Tom Stellard 6ec9e8043c R600: Expand v2i32 load/store instead of custom lowering
The custom lowering causes llc to crash with a segfault.

Ideally, the custom lowering can be fixed, but this allows
programs which load/store v2i32 to work without crashing.

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184480
2013-06-20 21:55:23 +00:00
Bill Wendling a3cd350249 Access the TargetLoweringInfo from the TargetMachine object instead of caching it. The TLI may change between functions. No functionality change.
llvm-svn: 184360
2013-06-19 21:36:55 +00:00
Matt Arsenault d46fce1141 Move StructurizeCFG out of R600 to generic Transforms.
Register it with PassManager

llvm-svn: 184343
2013-06-19 20:18:24 +00:00
Matt Arsenault 2aabb06175 Use GetUnderlyingObject instead of custom function
llvm-svn: 184261
2013-06-18 23:37:58 +00:00
Bill Wendling b7b1681157 Remove dead prototype.
llvm-svn: 184173
2013-06-18 06:24:14 +00:00
Vincent Lejeune 41d4cf26b4 R600: PV stores Reg id, not index
llvm-svn: 184117
2013-06-17 20:16:40 +00:00
Vincent Lejeune 8bd10421ec R600: Properly set COUNT_3 bit in TEX clause initiating inst for pre EG gen.
Fixes rv7x0 bug in Heaven reported here:
https://bugs.freedesktop.org/show_bug.cgi?id=64257

llvm-svn: 184116
2013-06-17 20:16:26 +00:00
Tom Stellard 371573448c R600: Add SI load support for v[24]i32 and store for v2i32
Also add a seperate vector lit test file, since r600 doesn't seem to handle
v2i32 load/store yet, but we can test both for SI.

Patch by: Aaron Watry

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 184021
2013-06-15 00:09:31 +00:00
Tom Stellard ecf9d86404 R600: Use correct encoding for Vertex Fetch instructions on Cayman
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184016
2013-06-14 22:12:30 +00:00
Tom Stellard 6aa0d5578d R600: Use EXPORT_RAT_INST_STORE_DWORD for stores on Cayman
We were using RAT_INST_STORE_RAW, which seemed to work, but the docs
say this instruction doesn't exist for Cayman, so it's probably safer
to use a documented instruction instead.

Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184015
2013-06-14 22:12:24 +00:00
Tom Stellard d99b7932ae R600: Factor the instruction encoding out the RAT_WRITE_CACHELESS_eg class
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184014
2013-06-14 22:12:19 +00:00
Tom Stellard 3d0823f1cd R600: Move instruction encoding definitions into a separate .td file
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 184013
2013-06-14 22:12:09 +00:00
Tom Stellard adba083bc2 R600: Don't try to fix reg class when copying IMPLICIT_DEF to a register
The test case for this is way too complex to be useful as a lit test,
and I was unable to reduce it.

https://bugs.freedesktop.org/show_bug.cgi?id=65438

llvm-svn: 183937
2013-06-13 20:14:00 +00:00
Benjamin Kramer 193960c822 R600: Make helper functions static.
llvm-svn: 183744
2013-06-11 13:32:25 +00:00
Vincent Lejeune d1a9d18120 R600: Use a refined heuristic to choose when switching clause
This is using a hint from AMD APP OpenCL Programming Guide with
empirically tweaked parameters.
I used Unigine Heaven 3.0 to determine best parameters on my system
(i7 2600/Radeon 6950/Kernel 3.9.4) the benchmark :
it went from 38.8 average fps to 39.6, which is ~3% gain.
(Lightmark 2008.2 gain is much more marginal: from 537 to 539)

There is no lit test provided as the parameter were determined
empirically and it it would be nearly impossiblet to find a test
program that check for optimal behavior.

llvm-svn: 183593
2013-06-07 23:30:34 +00:00
Vincent Lejeune 4d143328df R600: Anti dep better handled in tex clause
llvm-svn: 183592
2013-06-07 23:30:26 +00:00
Tom Stellard d74583777f R600: Fix calculation of stack offset in AMDGPUFrameLowering
We weren't computing structure size correctly and we were relying on
the original alloca instruction to compute the offset, which isn't
always reliable.

Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183568
2013-06-07 20:52:05 +00:00
Tom Stellard a6c6e1bfc2 R600: Rework subtarget info and remove AMDILDevice classes
This should simplify the subtarget definitions and make it easier to
add new ones.

Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183566
2013-06-07 20:37:48 +00:00
Bill Wendling 37e9adb091 Don't cache the instruction and register info from the TargetMachine, because
the internals of TargetMachine could change.

No functionality change intended.

llvm-svn: 183561
2013-06-07 20:28:55 +00:00
Tom Stellard 3498e4ff1d R600: Fix the fetch limits for R600 generation GPUs
Reviewed-by: Vincent Lejeune <vljn@ovi.com>

https://bugs.freedesktop.org/show_bug.cgi?id=64257

llvm-svn: 183560
2013-06-07 20:28:55 +00:00
Tom Stellard 99792774a4 R600: Move Subtarget feature definitions into AMDGPU.td
This is the convention used by the other targets.

Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183559
2013-06-07 20:28:49 +00:00
Tom Stellard b0804ec2ad R600: Remove unnecessary include
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183558
2013-06-07 20:28:43 +00:00
Benjamin Kramer 705d841bb6 R600: Don't compare iterators of different maps.
Found be libstdc's debug mode.

llvm-svn: 183549
2013-06-07 19:59:34 +00:00
Benjamin Kramer ebe0be9ca4 Vincent says the element is at most once in the vector, so we don't need a full std::remove.
llvm-svn: 183541
2013-06-07 18:18:12 +00:00
Benjamin Kramer a857fe115b R600: Fix a potential iterator invalidation issue.
As a bonus this reduces the loop from O(n^2) to O(n).

llvm-svn: 183532
2013-06-07 16:13:49 +00:00
Vincent Lejeune 931bb768fd R600: Remove an extra break in R600OptimizeVectorRegisters.cpp
llvm-svn: 183528
2013-06-07 15:44:53 +00:00
Vincent Lejeune 0030362ed9 R600: Rewrite an awkward loop in R600MachineScheduler
llvm-svn: 183458
2013-06-06 23:08:32 +00:00
Vincent Lejeune 54476a1503 R600: Remove leftover code in R600MachineScheduler.cpp
Spotted by Benjamin Kramer.

llvm-svn: 183413
2013-06-06 14:18:29 +00:00
Bill Wendling b91216817f Cast to the correct type. Pointer, not reference.
llvm-svn: 183385
2013-06-06 05:39:29 +00:00
NAKAMURA Takumi 4a8f079371 R600OptimizeVectorRegisters.cpp: Tweak a warning. [-Wsometimes-uninitialized]
FIXME: Is it false alarm?
llvm-svn: 183371
2013-06-06 02:15:12 +00:00
NAKAMURA Takumi e5555fc238 R600OptimizeVectorRegisters.cpp: Suppress a warning. [-Wunused-variable]
llvm-svn: 183370
2013-06-06 02:15:06 +00:00
NAKAMURA Takumi 372574d447 Trailing linefeed.
llvm-svn: 183369
2013-06-06 02:15:00 +00:00
Bill Wendling e410576865 Cast to the proper type.
llvm-svn: 183365
2013-06-06 01:04:21 +00:00
Tom Stellard acec99c948 R600: Replace predicate loop with predicate function
llvm-svn: 183351
2013-06-05 23:39:50 +00:00
Vincent Lejeune dec1875207 R600: Add a pass that merge Vector Register
Previously commited @183279 but tests were failing, reverted @183286
It was broken because @183336 was missing, now it's there.

llvm-svn: 183343
2013-06-05 21:38:04 +00:00
Vincent Lejeune 4b5b849753 R600: Schedule copy from phys register at beginning of block
It allows regalloc pass to remove them by trivially assigning associated reg

llvm-svn: 183336
2013-06-05 20:27:35 +00:00
Tom Stellard aad5376fb6 R600: Make sure to schedule AR register uses and defs in the same clause
Reviewed-by: vljn at ovi.com
llvm-svn: 183294
2013-06-05 03:43:06 +00:00
Rafael Espindola beef23fe21 Revert "R600: Add a pass that merge Vector Register"
This reverts commit r183279. CodeGen/R600/texture-input-merge.ll was failing.

llvm-svn: 183286
2013-06-05 01:48:30 +00:00
Vincent Lejeune a45aafabfe R600: Add a pass that merge Vector Register
llvm-svn: 183279
2013-06-04 23:17:26 +00:00
Vincent Lejeune c689679173 R600: Const/Neg/Abs can be folded to dot4
llvm-svn: 183278
2013-06-04 23:17:15 +00:00
Vincent Lejeune 276ceb8d5f R600: Swizzle texture/export instructions
llvm-svn: 183229
2013-06-04 15:04:53 +00:00
Aaron Ballman 19978553d4 Silencing an MSVC warning about mixing bool and unsigned int.
llvm-svn: 183176
2013-06-04 01:03:03 +00:00
Tom Stellard 94593ee8c3 R600/SI: Add support for work item and work group intrinsics
llvm-svn: 183138
2013-06-03 17:40:18 +00:00
Tom Stellard ed882c2f1b R600/SI: Add a calling convention for compute shaders
llvm-svn: 183137
2013-06-03 17:40:11 +00:00
Tom Stellard 046039e81b R600/SI: Custom lower i64 sign_extend
llvm-svn: 183136
2013-06-03 17:40:03 +00:00
Tom Stellard 0518ff89ba R600/SI: Adjust some instructions' out register class after ISel
This is necessary to avoid generating VGPR to SGPR copies in some
cases.

llvm-svn: 183135
2013-06-03 17:39:58 +00:00
Tom Stellard bad1f59212 R600/SI: Handle REG_SEQUENCE in fitsRegClass()
llvm-svn: 183134
2013-06-03 17:39:54 +00:00
Tom Stellard b5a97004fb R600/SI: Handle nodes with glue results correctly SITargetLowering::foldOperands()
llvm-svn: 183133
2013-06-03 17:39:50 +00:00
Tom Stellard 2183b70523 R600/SI: Fixup CopyToReg register class in PostprocessISelDAG()
The CopyToReg nodes will sometimes try to copy a value from a VGPR to an
SGPR.  This kind of copy is not possible, so we need to detect
VGPR->SGPR copies and do something else.  The current strategy is to
replace these copies with VGPR->VGPR copies and hope that all the users
of CopyToReg can accept VGPRs as arguments.

llvm-svn: 183132
2013-06-03 17:39:46 +00:00
Tom Stellard 07a10a3d3f R600/SI: Add support for global loads
llvm-svn: 183131
2013-06-03 17:39:43 +00:00
Tom Stellard 556d9aa841 R600/SI: Rework MUBUF store instructions
The lowering of stores is now mostly handled in the tablegen files.  No
more BUFFER_STORE nodes I generated during legalization.

llvm-svn: 183130
2013-06-03 17:39:37 +00:00
Vincent Lejeune 91a942b93e R600: 3 op instructions have no write bit but the result are store in PV
llvm-svn: 183111
2013-06-03 15:56:12 +00:00