If the extension of a loaded value is compared against zero and used in
other arithmetic, InstCombine will change the comparison to use the
unextended load. It's also possible that the comparison could be against
the unextended load from the outset.
In DAG form this becomes a truncation of an extending load. We want to
strip the truncation if possible so that we can use load-and-test instructions.
llvm-svn: 197804
This originally came about after noticing that InstCombine turns
some of the TMHH (icmp (and...), ...) tests into plain comparisons.
Since there is no instruction to compare with a 64-bit immediate,
TMHH is generally better than an ordered comparison for the cases
that it can handle.
llvm-svn: 197238
This patch makes more use of LPGFR and LNGFR. It builds on top of
the LTGFR selection from r197234. Most of the tests are motivated
by what InstCombine would produce.
llvm-svn: 197236
...in an attempt to rein back the increasingly complex selection code.
A knock-on effect is that ICmpType is exposed from the outset, which
slightly simplifies adjustSubwordCmp.
The code is no piece of art even after this change, but at least it should
be slightly better. No behavioral change intended.
llvm-svn: 197235
InstCombine turns (sext (trunc)) into (ashr (shl)), then converts any
comparison of the ashr against zero into a comparison of the shl against zero.
This makes sense in itself, but we want to undo it for z, since the sign-
extension instruction has a CC-setting form.
I've included tests for both the original and InstCombined variants,
but the former already worked. The patch fixes the latter.
llvm-svn: 197234
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.
Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.
The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences. It is a no-op for targets other than SystemZ.
llvm-svn: 196906
One unusual feature of the z architecture is that the result of a
previous load can be reused indefinitely for subsequent loads, even if
a cache-coherent store to that location is performed by another CPU.
A special serializing instruction must be used if you want to force
a load to be reattempted.
Since volatile loads are not supposed to be omitted in this way,
we should insert a serializing instruction before each such load.
The same goes for atomic loads.
The patch implements this at the IR->DAG boundary, in a similar way
to atomic fences. It is a no-op for targets other than SystemZ.
llvm-svn: 196905
Since z has no setcc instruction as such, the choice of setBooleanContents
is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent,
so we produced a branch-free form when selecting between 0 and 1,
but not when selecting between 0 and -1. This patch handles the latter
case too.
At some point I'd like to measure whether it's better to use conditional
moves for constant selects on z196, but that's future work.
llvm-svn: 196578
The backend converts 64-bit ORs into subreg moves if the upper 32 bits
of one operand and the low 32 bits of the other are known to be zero.
It then tries to peel away redundant ANDs from the upper 32 bits.
Since AND masks are canonicalized to exclude known-zero bits,
the test ORs the mask and the known-zero bits together before
checking for redundancy. The problem was that it was using the
wrong node when checking for known-zero bits, so could drop ANDs
that were still needed.
llvm-svn: 196267
We previously used the default expansion to SELECT_CC, which in turn would
expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based
sequence instead.
llvm-svn: 192784
Floats are stored in the high 32 bits of an FPR, and the only GPR<->FPR
transfers are full-register transfers. This patch optimizes GPR<->FPR
float transfers when the high word of a GPR is directly accessible.
llvm-svn: 191764
This just adds the basics necessary for allocating the upper words to
virtual registers (move, load and store). The move support is parameterised
in a way that makes it easy to handle zero extensions, but the associated
zero-extend patterns are added by a later patch.
The easiest way of testing this seemed to be add a new "h" register
constraint for high words. I don't expect the constraint to be useful
in real inline asms, but it should work, so I didn't try to hide it
behind an option.
llvm-svn: 191739
Use subreg_hNN and subreg_lNN for the high and low NN bits of a register.
List the low registers first, so that subreg_l32 also means the low 32
bits of a 128-bit register.
Floats are stored in the upper 32 bits of a 64-bit register, so they
should use subreg_h32 rather than subreg_l32.
No behavioral change intended.
llvm-svn: 191659
I'm about to add support for high-word operations, so it seemed better
for the low-word registers to have names like R0L rather than R0W.
No behavioral change intended.
llvm-svn: 191655
The backend previously folded offsets into PC-relative addresses
whereever possible. That's the right thing to do when the address
can be used directly in a PC-relative memory reference (using things
like LRL). But if we have a register-based memory reference and need
to load the PC-relative address separately, it's better to use an anchor
point that could be shared with other accesses to the same area of the
variable.
Fixes a FIXME.
llvm-svn: 191524
Another patch to avoid duplication of encoding information. Things like
NILF, NILL and NILH are used as both 32-bit and 64-bit instructions.
Here the 64-bit versions are defined as aliases of the 32-bit ones.
llvm-svn: 191369
Another patch to reduce the duplication of encoding information.
Rather than define separate patterns for truncating 64-bit stores,
use the 32-bit stores with a subreg. No behavioral changed intended.
llvm-svn: 191365
The main complication here is that TM and TMY (the memory forms) set
CC differently from the register forms. When the tested bits contain
some 0s and some 1s, the register forms set CC to 1 or 2 based on the
value the uppermost bit. The memory forms instead set CC to 1
regardless of the uppermost bit.
Until now, I've tried to make it so that a branch never tests for an
impossible CC value. E.g. NR only sets CC to 0 or 1, so branches on the
result will only test for 0 or 1. Originally I'd tried to do the same
thing for TM and TMY by using custom matching code in ISelDAGToDAG.
That ended up being very ugly though, and would have meant duplicating
some of the chain checks that the common isel code does.
I've therefore gone for the simpler alternative of adding an extra
operand to the TM DAG opcode to say whether a memory form would be OK.
This means that the inverse of a "TM;JE" is "TM;JNE" rather than the
more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE".
I suppose that's arguably less confusing though...
llvm-svn: 190400
The architecture has many comparison instructions, including some that
extend one of the operands. The signed comparison instructions use sign
extensions and the unsigned comparison instructions use zero extensions.
In cases where we had a free choice between signed or unsigned comparisons,
we were trying to decide at lowering time which would best fit the available
instructions, taking things like extension type into account. The code
to do that was getting increasingly hairy and was also making some bad
decisions. E.g. when comparing the result of two LLCs, it is better to use
CR rather than CLR, since CR can be fused with a branch while CLR can't.
This patch removes the lowering code and instead adds an operand to
integer comparisons to say whether signed comparison is required,
whether unsigned comparison is required, or whether either is OK.
We can then leave the choice of instruction up to the normal isel code.
llvm-svn: 190138
For now this just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189819
For now just handles simple comparisons of an ANDed value with zero.
The CC value provides enough information to do any comparison for a
2-bit mask, and some nonzero comparisons with more populated masks,
but that's all future work.
llvm-svn: 189469
Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs.
Lengths above that threshold use a loop to handle X*256 bytes followed
by a single MVC to handle the excess (if any). This loop will also be
needed in future when support for variable lengths is added.
Because the same tablegen classes are used to define MVC and CLC,
the patch also has the side-effect of defining a pseudo loop instruction
for CLC. That instruction isn't used yet (and wouldn't be handled correctly
if it were). I'm planning to use it soon though.
llvm-svn: 189331
The initial port used MLG(R) for i64 UMUL_LOHI but left the other three
combinations as not-legal-or-custom. Although 32x32->{32,32}
multiplications exist, they're not as quick as doing a normal 64-bit
multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI
would be useful. There's also no direct instruction for i64 SMUL_LOHI,
so it needs to be implemented in terms of UMUL_LOHI.
However, not defining these patterns means that we don't convert
division by a constant into multiplication, so this patch fills
in the other cases. The new i64 SMUL_LOHI sequence is simpler
than the one that we used previously for 64x64->128 multiplication,
so int-mul-08.ll now tests the full sequence.
llvm-svn: 188898
SystemZTargetLowering::emitStringWrapper() previously loaded the character
into R0 before the loop and made R0 live on entry. I'd forgotten that
allocatable registers weren't allowed to be live across blocks at this stage,
and it confused LiveVariables enough to cause a miscompilation of f3 in
memchr-02.ll.
This patch instead loads R0 in the loop and leaves LICM to hoist it
after RA. This is actually what I'd tried originally, but I went for
the manual optimisation after noticing that R0 often wasn't being hoisted.
This bug forced me to go back and look at why, now fixed as r188774.
We should also try to optimize null checks so that they test the CC result
of the SRST directly. The select between null and the SRST GPR result could
then usually be deleted as dead.
llvm-svn: 188779
This first cut is pretty conservative. The final argument register (R6)
is call-saved, so we would need to make sure that the R6 argument to a
sibling call is the same as the R6 argument to the calling function,
which seems worth keeping as a separate patch.
Saying that integer truncations are free means that we no longer
use the extending instructions LGF and LLGF for spills in int-conv-09.ll
and int-conv-10.ll. Instead we treat the registers as 64 bits wide and
truncate them to 32-bits where necessary. I think it's unlikely we'd
use LGF and LLGF for spills in other situations for the same reason,
so I'm removing the tests rather than replacing them. The associated
code is generic and applies to many more instructions than just
LGF and LLGF, so there is no corresponding code removal.
llvm-svn: 188669
This also fixes a bug in the predication of LR to LOCR: I'd forgotten
that with these in-place instruction builds, the implicit operands need
to be added manually. I think this was latent until now, but is tested
by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by
int-cmp-45.c.
llvm-svn: 187573
Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own,
but it exposes more opportunities for CC reuse (the next patch).
llvm-svn: 187571
The loop optimizers were assuming that scales > 1 were OK. I think this
is actually a bug in TargetLoweringBase::isLegalAddressingMode(),
since it seems to be trying to reject anything that isn't r+i or r+r,
but it has no default case for scales other than 0, 1 or 2. Implementing
the hook for z means that z can no longer test any change there though.
llvm-svn: 187497
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken. We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities. For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2. If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3. Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.
Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll. Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.
The patch also makes it easier to reuse CC results from other instructions.
llvm-svn: 187495
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare). It turns out that even
this is a bit too early. Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed. They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).
Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch. This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.
I've added a test for the AnalyzeBranch problem. A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.
llvm-svn: 187494
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them. This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value. This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.
No behavioral change intended, but it paves the way for a later patch.
llvm-svn: 187116
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.
llvm-svn: 187111
GPR and FPR constraints like "{r2}" and "{f2}" weren't handled correctly
because the name-to-regno mapping depends on the value type and
(because of that) the internal names in RegStrings are not the
same as the AsmName.
CC constraints like "{cc}" didn't work either because there was no
associated register class.
llvm-svn: 186148
in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in
order to resolve the following issues with fmuladd (i.e. optional FMA)
intrinsics:
1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd
intrinsics even if the subtarget does not support FMA instructions, leading
to laughably bad code generation in some situations.
2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128,
resulting in a call to a software fp128 FMA implementation.
3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types
like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize,
etc. to types that support hardware FMAs.
The function has also been slightly renamed for consistency and to force a
merge/build conflict for any out-of-tree target implementing it. To resolve,
see comments and fixed in-tree examples.
llvm-svn: 185956
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC. As with memcpy, I'm leaving longer cases
till later.
The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.
llvm-svn: 185918
I was originally going to use MVC for memmove too, but that's less of
a clear win. Remove some accidental left-overs in the previous commit.
llvm-svn: 185804
Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
and (sdiv i64, i32).
The "32" in "SDIVREM32" just refers to the second operand. The first operand
of all *DIVREM*s is a GR128.
llvm-svn: 185435
Add pseudo conditional store instructions, so that we use:
branch foo:
store
foo:
instead of:
load
branch foo:
move
foo:
store
z196 has real 32-bit and 64-bit conditional stores, but we don't use
any z196 instructions yet.
llvm-svn: 185065
The code to distinguish between unaligned and aligned addresses was
already there, so this is mostly just a switch-on-and-test process.
llvm-svn: 182920
Fixes PR16146: gdb.base__call-ar-st.exp fails after
pre-RA-sched=source fixes.
Patch by Xiaoyi Guo!
This also fixes an unsupported dbg.value test case. Codegen was
previously incorrect but the test was passing by luck.
llvm-svn: 182885
This patch adds support for the CRJ and CGRJ instructions. Support for
the immediate forms will be a separate patch.
The architecture has a large number of comparison instructions. I think
it's generally better to concentrate on using the "best" comparison
instruction first and foremost, then only use something like CRJ if
CR really was the natual choice of comparison instruction. The patch
therefore opportunistically converts separate CR and BRC instructions
into a single CRJ while emitting instructions in ISelLowering.
llvm-svn: 182764
Addresses a review comment from Ulrich Weigand. No functional change intended.
I'm not sure whether the old TODO that this patch touches still holds,
but that's something we'd get to when adding a targetted scheduling
description.
llvm-svn: 182474
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output. This was
a useful first step, but it had two problems:
(1) The z assembler isn't traditionally supposed to perform branch shortening
or branch relaxation. We followed this rule by not relaxing branches
in assembler input, but that meant that generating assembly code and
then assembling it would not produce the same result as going directly
to object code; the former would give long branches everywhere, whereas
the latter would use short branches where possible.
(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
We would need to do something else before supporting them.
(Although COMPARE AND BRANCH does not change the condition codes,
the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
during codegen, so that we can safely lower it to a separate compare
and long branch where necessary. This is not a valid transformation
for the assembler proper to make.)
This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.
The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added. The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.
The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.
llvm-svn: 182274
This adds the actual lib/Target/SystemZ target files necessary to
implement the SystemZ target. Note that at this point, the target
cannot yet be built since the configure bits are missing. Those
will be provided shortly by a follow-on patch.
This version of the patch incorporates feedback from reviews by
Chris Lattner and Anton Korobeynikov. Thanks to all reviewers!
Patch by Richard Sandiford.
llvm-svn: 181203
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
const_casts, and it reinforces the design of the Target classes being
immutable.
SelectionDAGISel::IsLegalToFold is now a static member function, because
PIC16 uses it in an unconventional way. There is more room for API
cleanup here.
And PIC16's AsmPrinter no longer uses TargetLowering.
llvm-svn: 101635
Target independent isel should always pass along the "tail call" property. Change
target hook LowerCall's parameter "isTailCall" into a refernce. If the target
decides it's impossible to honor the tail call request, it should set isTailCall
to false to make target independent isel happy.
llvm-svn: 94626
slots. The AsmPrinter will use this information to determine whether to
print a spill/reload comment.
Remove default argument values. It's too easy to pass a wrong argument
value when multiple arguments have default values. Make everything
explicit to trap bugs early.
Update all targets to adhere to the new interfaces..
llvm-svn: 87022
eliminating a use of MVT::Flag, this is needed for an upcoming CodeGen
change.
This unfortunately requires SystemZ to switch to the list-burr
scheduler, in order to handle the physreg defs properly, however
that's what LLVM has available at this time.
llvm-svn: 85357
Instead of awkwardly encoding calling-convention information with ISD::CALL,
ISD::FORMAL_ARGUMENTS, ISD::RET, and ISD::ARG_FLAGS nodes, TargetLowering
provides three virtual functions for targets to override:
LowerFormalArguments, LowerCall, and LowerRet, which replace the custom
lowering done on the special nodes. They provide the same information, but
in a more immediately usable format.
This also reworks much of the target-independent tail call logic. The
decision of whether or not to perform a tail call is now cleanly split
between target-independent portions, and the target dependent portion
in IsEligibleForTailCallOptimization.
This also synchronizes all in-tree targets, to help enable future
refactoring and feature work.
llvm-svn: 78142
it is highly specific to the object file that will be generated in the end,
this introduces a new TargetLoweringObjectFile interface that is implemented
for each of ELF/MachO/COFF/Alpha/PIC16 and XCore.
Though still is still a brutal and ugly refactoring, this is a major step
towards goodness.
This patch also:
1. fixes a bunch of dangling pointer problems in the PIC16 backend.
2. disables the TargetLowering copy ctor which PIC16 was accidentally using.
3. gets us closer to xcore having its own crazy target section flags and
pic16 not having to shadow sections with its own objects.
4. fixes wierdness where ELF targets would set CStringSection but not
CStringSection_. Factor the code better.
5. fixes some bugs in string lowering on ELF targets.
llvm-svn: 77294