safe on PPC32 SVR4 ABI
[Patch and following text by Mark Minich; committing on his behalf.]
There are FIXME's in PowerPC/PPCFrameLowering.cpp, method
PPCFrameLowering::emitPrologue() related to "negative offsets of R1"
on PPC32 SVR4. They're true, but the real issue is that on PPC32 SVR4
(and any ABI without a Red Zone), no spills may be made until after
the stackframe is claimed, which also includes the LR spill which is
at a positive offset. The same problem exists in emitEpilogue(),
though there's no FIXME for it. I intend to fix this issue, making
LLVM-compiled code finally safe for use on SVR4/EABI/e500 32-bit
platforms (including in particular, OS-free embedded systems & kernel
code, where interrupts may share the same stack as user code).
In preparation for making these changes, to make the diffs for the
functional changes less cluttered, I am providing the non-functional
refactorings in two stages:
Stage 1 does some minor fluffy refactorings to pull multiple method
calls up into a single bool, creating named bools for repeated uses of
obscure logic, moving some code up earlier because either stage 2 or
my final version will require it earlier, and rewording/adding some
comments. My stage 1 changes can be characterized as primarily fluffy
cleanup, the purpose of which may be unclear until the stage 2 or
final changes are made.
My stage 2 refactorings combine the separate PPC32 & PPC64 logic,
which is currently performed by largely duplicate code, into a single
flow, with the differences handled by a group of constants initialized
early in the methods.
This submission is for my stage 1 changes. There should be no
functional changes whatsoever; this is a pure refactoring.
llvm-svn: 188573
The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD
instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused
it to corrupt the encoding of that by clobbering the first operand with
the second one.
Undo that damage and only apply the SMRD logic to that.
Fixes some derivates related piglit regressions with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188558
This unbreaks PIC with fast isel on ELF targets (PR16717). The output matches
what GCC and SDag do for PIC but may not cover all of the many flavors of PIC
that exist.
llvm-svn: 188551
Thumb2 literal loads use an offset encoding which allows for
negative zero. This fixes parsing and encoding so that #-0
is correctly processed. The parser represents #-0 as INT32_MIN.
llvm-svn: 188549
There are many Thumb instructions which take 12-bit immediates encoded in a special
8-byte value + 4-byte rotator form. Not all numbers are represented, and it's legal
to transform an assembly instruction to be able to encode the immediate.
For example: AND and BIC are complementary instructions; one can switch the AND
to a BIC as long as the immediate is complemented.
The intent is to switch one instruction into its complementary one when the immediate
cannot be encoded in the form requested in the original assembly and when the
complementary immediate is encodable.
The patch addresses two issues:
1. definition of t2SOImmNot immediate - it has to check that the orignal value is
not encoded naturally
2. t2AND and t2BIC instruction aliases which should use the Thumb2 SOImm operand
rather than the ARM one.
llvm-svn: 188548
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did. I've split this out into a
subroutine so that it can be used for other upcoming patches.
I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads. I don't have a testcase for that because
we don't do any interesting scheduling on z yet.
llvm-svn: 188540
r188163 used CLC to implement memcmp. Code that compares the result
directly against zero can test the CC value produced by CLC, but code
that needs an integer result must use IPM. The sequence I'd used was:
ipm <reg>
sll <reg>, 2
sra <reg>, 30
but I'd forgotten that this inverts the order, so that CC==1 ("less")
becomes an integer greater than zero, and CC==2 ("greater") becomes
an integer less than zero. This sequence should only be used if the
CLC arguments are reversed to compensate. The problem then is that
the branch condition must also be reversed when testing the CLC
result directly.
Rather than do that, I went for a different sequence that works with
the natural CLC order:
ipm <reg>
srl <reg>, 28
rll <reg>, <reg>, 31
One advantage of this is that it doesn't clobber CC. A disadvantage
is that any sign extension to 64 bits must be done separately,
rather than being folded into the shifts.
llvm-svn: 188538
The SIInsertWaits pass was overwriting the first operand (gds bit) of
DS_WRITE_B32 with the second operand (value to write). This meant that
any time the value to write was stored in an odd number VGPR, the gds
bit would be set causing the instruction to write to GDS instead of LDS.
llvm-svn: 188522
Before this patch this flag is IOS specific, but is also
useful for bare project like bootloaders / kernels etc,
since movw / movt prevents simple relocation. Therefore
make this flag more commonly available.
note: this patch depends on a similiar rename in clang
Patch by Jeroen Hofstee.
llvm-svn: 188487
r9 is defined as a platform-specific register in the ARM EABI.
It can be reserved for a special purpose or be used as a general
purpose register. Add support for reserving r9 for all ARM, while
leaving the IOS usage unchanged.
Patch by Jeroen Hofstee.
llvm-svn: 188485
1. The offset range for Thumb1 PC relative loads is [0..1020] and not [-1024..1020]
2. Thumb2 PC relative loads may define the PC, so the restriction placed on target register is removed
3. Removes unneeded alias between "ldr.n" and t1LDRpci. ".n" is actually stripped by both tablegen
and the ASM parser, so this alias rule really does nothing
llvm-svn: 188466
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG
instructions should make it easier for the register allocator to coalasce
unnecessary copies.
v2:
- Use an SGPR register class if all the operands of BUILD_VECTOR are
SGPRs.
llvm-svn: 188427
The instruction selector will now try to infer the destination register
so it can decided whether to use V_MOV_B32 or S_MOV_B32 when copying
immediates.
llvm-svn: 188426
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.
llvm-svn: 188425