This increases the number of opportunites we have for folding. With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186919
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186916
This makes them consistent with 'bt' which already had this handling. gas has the same behavior. There have been discussions on the mailing list about determining size based on the immediate, but my goal here was just to remove the inconsistency.
llvm-svn: 186904
It only didn't use it before because it seems InstAlias handling in the asm printer fails to count tied operands so it tried to find an xor with 2 operands instead of the 3 it wfails to count tied.
llvm-svn: 186900
Enable parsing all 32 floating point control registers $0-31 and stop trying to
parse floating point condition code register $fcc0. Also, return ParseFail if
the operand being parsed is not in the expected format.
llvm-svn: 186861
instructions. With this patch:
1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset
llvm-svn: 186831
After Ulrich's r180677 (thanks!) TableGen is intelligent enough to
handle tied constraints involving complex operands properly, so
virtually all of the ARM custom converters are now unnecessary.
llvm-svn: 186810
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.
This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.
For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll).
<rdar://problem/14464830>
llvm-svn: 186735
The atomic tests assume the two-operand forms, so I've restricted them to z10.
Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why
using convertToThreeAddress() is better than exposing the three-operand forms
first and then converting back to two operands where possible (which is what
I'd originally tried). Using the three-operand form first stops us from
taking advantage of NG, OG and XG for spills.
llvm-svn: 186683
This first step just adds definitions for SLLK, SRLK and SRAK.
The next patch will actually make use of them during codegen.
insn-bad.s tests that some form of error is reported when using these
instructions on z10. More work is needed to get the "instruction requires:
distinct-ops" that we'd ideally like, so I've stubbed that part out for now.
I'll come back and make it mandatory once the necessary changes are in.
llvm-svn: 186680
The original code only folded SRA into ROTATE ... SELECTED BITS
if there was no outer shift. This patch splits out that check
and generalises it slightly. The extra cases aren't really that
interesting, but this is paving the way for RNSBG support.
llvm-svn: 186571
In hindsight, using "RISBG" for something that can be any type of
R.SBG instruction was a bit confusing, so this renames it to RxSBG.
That might not be the best choice either, since there is an instruction
called RXSBG, but hopefully the lower-case letter stands out enough.
While there I fixed a couple of GNUisms that had crept in --
sorry about that!
llvm-svn: 186569
Support for dynamic stack alignments in the PPC backend has been unfinished, in
part because it depends on dynamic stack realignment (which I only just
recently implemented fully). Now we can also support dynamic allocas with
higher than the default target stack alignment (16 bytes).
In order to round-up the requested size to the maximum requested alignment, we
need an additional register to hold the rounded-up size. We're already using one
scavenged register to hold the previous stack-pointer value (which needs to be
stored with the signal-safe stdux update), and so when we have dynamic allocas
and a large alignment, we allocate two emergency spill slots for the scavenger.
llvm-svn: 186562
First, this changes the base-pointer implementation to remove an unnecessary
complication (and one that is incompatible with how builtin SjLj is
implemented): instead of using r31 as the base pointer when it is not needed as
a frame pointer, now the base pointer will always be r30 when needed.
Second, we introduce another pseudo register, BP, which is used just like the FP
pseudo register to refer to the base register before we know for certain what
register it will be.
Third, we now save BP into the jmp_buf, and restore r30 from that slot in
longjmp. If the function that called setjmp did not use a base pointer, then
r30 will be overwritten by the setjmp-calling-function's restore code. FP
restoration (which is restored into r31) works the same way.
llvm-svn: 186545
My patch 'r183551 - ARM FastISel integer sext/zext improvements' was incorrect when emitting ARM register-immediate ASR, LSL, LSR instructions: they are pseudo-instructions in ARMInstrInfo.td and I should have used MOVsi instead.
This is not an issue when code is generated through a .s file, but is an issue when generated straight to a .o (-filetype=obj).
llvm-svn: 186489
Because the builtin longjmp implementation uses a CTR-based indirect jump, when
the control flow arrives at the builtin setjmp call, the CTR register has
necessarily been clobbered. Correspondingly, this adds CTR to the list of
implicit definitions of the builtin setjmp pseudo instruction.
We don't need to add CTR to the implicit definitions of builtin longjmp
because, even though it does clobber the CTR register, the control flow cannot
return to inside the loop unless there is also a builtin setjmp call.
llvm-svn: 186488
This builds on some frame-lowering code that has existed since 2005 (r24224)
but was disabled in 2008 (r48188) because it needed base pointer support to
function correctly. This implementation follows the strategy suggested by Dale
Johannesen in r48188 where the following comment was added:
This does not currently work, because the delta between old and new stack
pointers is added to offsets that reference incoming parameters after the
prolog is generated, and the code that does that doesn't handle a variable
delta. You don't want to do that anyway; a better approach is to reserve
another register that retains to the incoming stack pointer, and reference
parameters relative to that.
And now we do exactly that. If we don't need a frame pointer, then we use r31
as a base pointer. If we do need a frame pointer, then we use r30 as a base
pointer. The base pointer retains the value of the stack pointer before it was
decremented in the prologue. We then use the base pointer to resolve all
negative frame indicies. The basic scheme follows that for base pointers in the
X86 backend.
We use a base pointer when we need to dynamically realign the incoming stack
pointer. This currently applies only to static objects (dynamic allocas with
large alignments, and base-pointer support in SjLj lowering will come in future
commits).
llvm-svn: 186478
block. Blocks that have an indirect branch terminator, even if it's not the
last terminator, should still be treated as unanalyzable.
<rdar://problem/14437274>
Reducing a useful regression test case is proving difficult - I hope to have
one soon.
llvm-svn: 186461
This adds an instruction alias to make the assembler recognize the alternate literal form: pli [PC, #+/-<imm>]
See A8.8.129 in the ARM ARM (DDI 0406C.b).
Fixes <rdar://problem/14403733>.
llvm-svn: 186459
Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required
instructions. This trick also works for UGT/ULT, but there is no advantage in
doing so. It wouldn't reduce the number of instructions and it would actually
reduce performance.
Reviewer: Ben
radar:5972691
llvm-svn: 186432
We'd forgotten to provide string representations for the special ARMISD atomic
nodes; this adds them in. No effect on CodeGen, just makes the output of
"-view-whatever-dags" slightly more readable.
llvm-svn: 186406
Another patch in the series to make more use of R.SBG. This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.
llvm-svn: 186399
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
llvm-svn: 186390
This change mirrors the changes that were made to the X86 and ARM targets to
support subtarget feature changing. As indicated in r182899, the mechanism is
still undergoing revision, and so as with the X86 and ARM targets, there is no
test case yet (there is no effective functionality change).
llvm-svn: 186357
PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the
common subclass of the true and false inputs, and then selecting either the
32-bit or the 64-bit isel variant based on the result of calling
PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC)
(where RC is the common subclass). Unfortunately, this is not quite right: if
we have something like this:
%vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76;
G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6
then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and
G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8
pseudo-register). As a result, we also need to check the common subclass
against GPRC_NOR0 and G8RC_NOX0 explicitly.
This had not been a problem for clients of insertSelect that called
canInsertSelect first (because it had a compensating mistake), but insertSelect
is also used by the PPC pseudo-instruction expander, and this error was causing
a problem in that context.
This problem was found by csmith.
llvm-svn: 186343