Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.
v2:
- Use an SGPR register class if all the operands of BUILD_VECTOR are
SGPRs.
llvm-svn: 188427
The instruction selector will now try to infer the destination register
so it can decided whether to use V_MOV_B32 or S_MOV_B32 when copying
immediates.
llvm-svn: 188426
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.
llvm-svn: 188425
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler. Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:
With Sched::VLIW
Total Compile Time: 1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)
With Sched::Source
Total Compile Time: 0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)
The code ouput was identical with both schedulers. This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.
llvm-svn: 188215
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
llvm-svn: 187831
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.
llvm-svn: 187780
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions
Tom Stellard:
- Mark vec2 operations as expand. The addition of a vec2 register
class made them all legal.
Patch by: Dmitry Cherkassov
Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
If we merge vector when a vector is used, it will generate an artificial
antidependency that can prevent 2 tex/vtx instructions to use the same
clause and thus generate extra clauses that reduce performance.
There is no test case as such situation is really hard to predict.
llvm-svn: 187516
There are a lot of restrictions on instruction groups that contain
LDS instructions, so for now we will be conservative and not packetize
anything else with them.
llvm-svn: 187513
We were using two instructions for similar purpose : break and
predicated break. Only predicated_break was emitted and it was
lowered at R600ControlFlowFinalizer to JUMP;CF_BREAK;POP.
This commit simplify the situation by making AMDILCFGStructurizer
emit IF_PREDICATE;BREAK;ENDIF; instead of predicated_break (which
is now removed).
There is no functionality change.
llvm-svn: 187510
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
llvm-svn: 187006
This increases the number of opportunites we have for folding. With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.
NOTE: This is a candidate for the stable branch.
llvm-svn: 185943
This is dead code since PIC16 was removed in 2010. The result was an odd mix,
where some parts would carefully pass it along and others would assert it was
zero (most of the object streamer for example).
llvm-svn: 185436
By default, we expand these operations for both EG and SI. Move the
duplicated code into a common space for now. If the targets ever actually
implement these operations as instructions, we can override that in the relevant
target.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184848
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UREM produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184844
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184843