This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)
On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.
Patch by Micah Villmow.
llvm-svn: 194783
The LDS output queue is accessed via the OQAP register. The OQAP
register cannot be live across clauses, so if value is written to the
output queue, it must be retrieved before the end of the clause.
With the machine scheduler, we cannot statisfy this constraint, because
it lacks proper alias analysis and it will mark some LDS accesses as
having a chain dependency on vertex fetches. Since vertex fetches
require a new clauses, the dependency may end up spiltting OQAP uses and
defs so the end up in different clauses. See the lds-output-queue.ll
test for a more detailed explanation.
To work around this issue, we now combine the LDS read and the OQAP
copy into one instruction and expand it after register allocation.
This patch also adds some checks to the EmitClauseMarker pass, so that
it doesn't end a clause with a value still in the output queue and
removes AR.X and OQAP handling from the scheduler (AR.X uses and defs
were already being expanded post-RA, so the scheduler will never see
them).
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 194755
All shift operations will be selected as SALU instructions and then
if necessary lowered to VALU instructions in the SIFixSGPRCopies pass.
This allows us to do more operations on the SALU which will improve
performance and is also required for implementing private memory
using indirect addressing, since the private memory pointers must stay
in the scalar registers.
This patch includes some fixes from Matt Arsenault.
llvm-svn: 194625
Accepting quotes is a property of an assembler, not of an object file. For
example, ELF can support any names for sections and symbols, but the gnu
assembler only accepts quotes in some contexts and llvm-mc in a few more.
LLVM should not produce different symbols based on a guess about which assembler
will be reading the code it is printing.
llvm-svn: 194575
Print the range of registers used with a single letter prefix.
This better matches what the shader compiler produces and
is overall less obnoxious than concatenating all of the
subregister names together.
Instead of SGPR0, it will print s0. Instead of SGPR0_SGPR1,
it will print s[0:1] and so on.
There doesn't appear to be a straightforward way
to get the actual register info in the InstPrinter,
so this parses the generated name to print with the
new syntax.
The required test changes are pretty nasty, and register
matching regexes are now worse. Since there isn't a way to
add to a variable in FileCheck, some of the tests now don't
check the exact number of registers used, but I don't think that
will be a real problem.
llvm-svn: 194443
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted. In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.
llvm-svn: 193215
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated. The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory. This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.
For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
MOV instructions that use indirect addressing.
llvm-svn: 193179
We were calling llvm_unreachable() when failing to optimize the
branch into if case. However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192813
During instruction selection, we rewrite the destination register
class for MIMG instructions based on their writemasks. This creates
machine verifier errors since the new register class does not match
the register class in the MIMG instruction definition.
We can avoid this by defining different MIMG instructions for each
possible destination type and then switching to the correct instruction
when we change the register class.
llvm-svn: 192365
This prevents the machine verifier from complaining about uses of
an undefined physical register.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 192364
StructurizeCFG pass allows to make complex cfg reducible ; it allows a lot of
shader from shadertoy (which exhibits complex control flow constructs) to works
correctly with respect to CFG handling (and allow us to detect potential bug in
other part of the backend).
We provide a cmd line argument to disable the pass for debug purpose.
Patch by: Vincent Lejeune
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 192363
This patch fixes an old FIXME by creating a MCTargetStreamer interface
and moving the target specific functions for ARM, Mips and PPC to it.
The ARM streamer is still declared in a common place because it is
used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are
completely hidden in the corresponding Target directories.
I will send an email to llvmdev with instructions on how to use this.
llvm-svn: 192181
For targets that have instruction itineraries this means no change. Targets
that move over to the new schedule model will use be able the new schedule
module for instruction latencies in the if-converter (the logic is such that if
there is no itineary we will use the new sched model for the latencies).
Before, we queried "TTI->getInstructionLatency()" for the instruction latency
and the extra prediction cost. Now, we query the TargetSchedule abstraction for
the instruction latency and TargetInstrInfo for the extra predictation cost. The
TargetSchedule abstraction will internally call "TTI->getInstructionLatency" if
an itinerary exists, otherwise it will use the new schedule model.
ATTENTION: Out of tree targets!
(I will also send out an email later to LLVMDev)
This means, if your target implements
unsigned getInstrLatency(const InstrItineraryData *ItinData,
const MachineInstr *MI,
unsigned *PredCost);
and returns a value for "PredCost", you now also need to implement
unsigned getPredictationCost(const MachineInstr *MI);
(if your target uses the IfConversion.cpp pass)
radar://15077010
llvm-svn: 191671
We were completely ignoring the unorder/ordered attributes of condition
codes and also incorrectly lowering seto and setuo.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 191603
SelectionDAG will now attempt to inverse an illegal conditon in order to
find a legal one and if that doesn't work, it will attempt to swap the
operands using the inverted condition.
There are no new test cases for this, but a nubmer of the existing R600
tests hit this path.
llvm-svn: 191602
This is useful for targets like R600, which only support GT, GE, NE, and EQ
condition codes as it removes the need to handle unsupported condition
codes in target specific code.
There are no tests with this commit, but R600 has been updated to take
advantage of this new feature, so its existing selectcc tests are now
testing the swapped operands path.
llvm-svn: 191601
Previously, the DAGISel function WalkChainUsers was spotting that it
had entered already-selected territory by whether a node was a
MachineNode (amongst other things). Since it's fairly common practice
to insert MachineNodes during ISelLowering, this was not the correct
check.
Looking around, it seems that other nodes get their NodeId set to -1
upon selection, so this makes sure the same thing happens to all
MachineNodes and uses that characteristic to determine whether we
should stop looking for a loop during selection.
This should fix PR15840.
llvm-svn: 191165
The global registry is used to allow command line override of the
scheduler selection, but does not work well as the normal selection
API. For example, the same LLVM process should be able to target
multiple targets or subtargets.
llvm-svn: 191071
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
This fixes some regressions in the piglit local memory store tests
introduced by recent commits which made the scheduler aware of the trans
slot.
It's not possible to test this using lit, because there is no way to
determine from the assembly dumps whether or not an instruction is in
the trans slot.
Even if this were possible, the test would be highly sensitive to
changes in the scheduler and might generate confusing false negatives.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
llvm-svn: 190574
We used to generate the compact unwind encoding from the machine
instructions. However, this had the problem that if the user used `-save-temps'
or compiled their hand-written `.s' file (with CFI directives), we wouldn't
generate the compact unwind encoding.
Move the algorithm that generates the compact unwind encoding into the
MCAsmBackend. This way we can generate the encoding whether the code is from a
`.ll' or `.s' file.
<rdar://problem/13623355>
llvm-svn: 190290
This pass was segfaulting when it ran into a non-intrinsic function
call. Function calls are not supported, so now instead of segfaulting,
we will get an assertion failure with a nice error message.
I'm not sure how to test this using lit.
llvm-svn: 190076
Iterator of std::vector may be implemented as a raw pointer. In
this case begin iterators are rvalues and cannot be incremented.
For example, this is the case with STDCXX implementation of vector.
Patch by Konstantin Tokarev <annulen@yandex.ru>.
llvm-svn: 189911
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes. The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.
This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.
v2:
- Add more helper functions to TargetLoweringBase
- Use CHECK-LABEL for tests
llvm-svn: 189221
The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.
Undo that damage and only apply the SMRD logic to that.
Fixes some derivates related piglit regressions with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write). This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.
llvm-svn: 188522
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.
v2:
- Use an SGPR register class if all the operands of BUILD_VECTOR are
SGPRs.
llvm-svn: 188427
The instruction selector will now try to infer the destination register
so it can decided whether to use V_MOV_B32 or S_MOV_B32 when copying
immediates.
llvm-svn: 188426
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.
llvm-svn: 188425
R600 doesn't need to do any scheduling on the SelectionDAG now that it
has a very good MachineScheduler. Also, using the VLIW SelectionDAG
scheduler was having a major impact on compile times. For example with
the phatk kernel here are the LLVM IR to machine code compile times:
With Sched::VLIW
Total Compile Time: 1.4890 Seconds (User + System)
SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System)
With Sched::Source
Total Compile Time: 0.3330 Seconds (User + System)
SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System)
The code ouput was identical with both schedulers. This may not be true
for all programs, but it gives me confidence that there won't be much
reduction, if any, in code quality by using Sched::Source.
llvm-svn: 188215
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
llvm-svn: 187831
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.
llvm-svn: 187780
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions
Tom Stellard:
- Mark vec2 operations as expand. The addition of a vec2 register
class made them all legal.
Patch by: Dmitry Cherkassov
Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
llvm-svn: 187582
If we merge vector when a vector is used, it will generate an artificial
antidependency that can prevent 2 tex/vtx instructions to use the same
clause and thus generate extra clauses that reduce performance.
There is no test case as such situation is really hard to predict.
llvm-svn: 187516
There are a lot of restrictions on instruction groups that contain
LDS instructions, so for now we will be conservative and not packetize
anything else with them.
llvm-svn: 187513
We were using two instructions for similar purpose : break and
predicated break. Only predicated_break was emitted and it was
lowered at R600ControlFlowFinalizer to JUMP;CF_BREAK;POP.
This commit simplify the situation by making AMDILCFGStructurizer
emit IF_PREDICATE;BREAK;ENDIF; instead of predicated_break (which
is now removed).
There is no functionality change.
llvm-svn: 187510
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
llvm-svn: 187006
This increases the number of opportunites we have for folding. With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.
NOTE: This is a candidate for the stable branch.
llvm-svn: 185943