Commit Graph

94 Commits

Author SHA1 Message Date
Richard Sandiford 2cac763544 [SystemZ] Extend test-under-mask support to high GR32s
llvm-svn: 191773
2013-10-01 14:41:52 +00:00
Richard Sandiford 7028428c2c [SystemZ] Allow integer AND involving high words
llvm-svn: 191762
2013-10-01 14:20:41 +00:00
Richard Sandiford 5718dacbdd [SystemZ] Allow integer XOR involving high words
llvm-svn: 191759
2013-10-01 14:08:44 +00:00
Richard Sandiford 6e96ac600f [SystemZ] Allow integer OR involving high words
llvm-svn: 191755
2013-10-01 13:22:41 +00:00
Richard Sandiford 1a56931b22 [SystemZ] Allow integer insertions with a high-word destination
llvm-svn: 191753
2013-10-01 13:18:56 +00:00
Richard Sandiford 012402346f [SystemZ] Add patterns to load a constant into a high word (IIHF)
Similar to low words, we can use the shorter LLIHL and LLIHH if it turns
out that the other half of the GR64 isn't live.

llvm-svn: 191750
2013-10-01 13:02:28 +00:00
Richard Sandiford 21235a256f [SystemZ] Add register zero extensions involving at least one high word
llvm-svn: 191746
2013-10-01 12:49:07 +00:00
Richard Sandiford 5469c39a26 [SystemZ] Add truncating high-word stores (STCH and STHH)
llvm-svn: 191743
2013-10-01 12:22:49 +00:00
Richard Sandiford 0d46b1a30f [SystemZ] Add zero-extending high-word loads (LLCH and LLHH)
llvm-svn: 191742
2013-10-01 12:19:08 +00:00
Richard Sandiford 89e160d975 [SystemZ] Add sign-extending high-word loads (LBH and LHH)
llvm-svn: 191740
2013-10-01 12:11:47 +00:00
Richard Sandiford 0755c93b0c [SystemZ] Use upper words of GR64s for codegen
This just adds the basics necessary for allocating the upper words to
virtual registers (move, load and store).  The move support is parameterised
in a way that makes it easy to handle zero extensions, but the associated
zero-extend patterns are added by a later patch.

The easiest way of testing this seemed to be add a new "h" register
constraint for high words.  I don't expect the constraint to be useful
in real inline asms, but it should work, so I didn't try to hide it
behind an option.

llvm-svn: 191739
2013-10-01 11:26:28 +00:00
Richard Sandiford 87a4436456 [SystemZ] Rename subregs and add subreg_h32
Use subreg_hNN and subreg_lNN for the high and low NN bits of a register.
List the low registers first, so that subreg_l32 also means the low 32
bits of a 128-bit register.

Floats are stored in the upper 32 bits of a 64-bit register, so they
should use subreg_h32 rather than subreg_l32.

No behavioral change intended.

llvm-svn: 191659
2013-09-30 10:28:35 +00:00
Richard Sandiford 067817ee05 [SystemZ] Rein back the use of block operations
The backend tries to use block operations like MVC, NC, OC and XC for
simple scalar operations.  For correctness reasons, it rejects any case
in which the regions might partially overlap.  However, for performance
reasons, it should also reject cases where the regions might be equal,
since the instruction might then not use the fast path.

This fixes a performance regression seen in bzip2.  We may want to limit
the optimisation even more in future, or even remove it entirely, but I'll
try with this for now.

llvm-svn: 191525
2013-09-27 15:29:20 +00:00
Richard Sandiford 652784e29a [SystemZ] Define the GR64 low-word logic instructions as pseudo aliases.
Another patch to avoid duplication of encoding information.  Things like
NILF, NILL and NILH are used as both 32-bit and 64-bit instructions.
Here the 64-bit versions are defined as aliases of the 32-bit ones.

llvm-svn: 191369
2013-09-25 11:11:53 +00:00
Richard Sandiford 6cbd7f0c5d [SystemZ] Use subregs for 64-bit truncating stores
Another patch to reduce the duplication of encoding information.
Rather than define separate patterns for truncating 64-bit stores,
use the 32-bit stores with a subreg.  No behavioral changed intended.

llvm-svn: 191365
2013-09-25 10:29:47 +00:00
Richard Sandiford 93183ee78c [SystemZ] Add unsigned compare-and-branch instructions
For some reason I never got around to adding these at the same time as
the signed versions.  No idea why.

I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether
it should just be replaced with an "is normal" flag.  I'll leave that
for later though.

There are some boundary conditions that can be tweaked, such as preferring
unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256",
but again I'll leave those for a separate patch.

llvm-svn: 190930
2013-09-18 09:56:40 +00:00
Richard Sandiford e3827751e2 [SystemZ] Fix handling of 64-bit memcmp results
Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did.  I've split this out into a
subroutine so that it can be used for other upcoming patches.

I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads.  I don't have a testcase for that because
we don't do any interesting scheduling on z yet.

llvm-svn: 188540
2013-08-16 10:55:47 +00:00
Richard Sandiford a59012577c [SystemZ] Fix sign of integer memcmp result
r188163 used CLC to implement memcmp.  Code that compares the result
directly against zero can test the CC value produced by CLC, but code
that needs an integer result must use IPM.  The sequence I'd used was:

   ipm <reg>
   sll <reg>, 2
   sra <reg>, 30

but I'd forgotten that this inverts the order, so that CC==1 ("less")
becomes an integer greater than zero, and CC==2 ("greater") becomes
an integer less than zero.  This sequence should only be used if the
CLC arguments are reversed to compensate.  The problem then is that
the branch condition must also be reversed when testing the CLC
result directly.

Rather than do that, I went for a different sequence that works with
the natural CLC order:

   ipm <reg>
   srl <reg>, 28
   rll <reg>, <reg>, 31

One advantage of this is that it doesn't clobber CC.  A disadvantage
is that any sign extension to 64 bits must be done separately,
rather than being folded into the shifts.

llvm-svn: 188538
2013-08-16 10:22:54 +00:00
Richard Sandiford 564681c88d [SystemZ] Use CLC and IPM to implement memcmp
For now this is restricted to fixed-length comparisons with a length
in the range [1, 256], as for memcpy() and MVC.

llvm-svn: 188163
2013-08-12 10:28:10 +00:00
Richard Sandiford 0897fce2f4 [SystemZ] Optimize floating-point comparisons with zero
This follows the same lines as the integer code.  In the end it seemed
easier to have a second 4-bit mask in TSFlags to specify the compare-like
CC values.  That eats one more TSFlags bit than adding a CCHasUnordered
would have done, but it feels more concise.

llvm-svn: 187883
2013-08-07 11:10:06 +00:00
Richard Sandiford c212125d27 [SystemZ] Use BRCT and BRCTG to eliminate add-&-compare sequences
This patch just uses a peephole test for "add; compare; branch" sequences
within a single block.  The IR optimizers already convert loops to
decrement-and-branch-on-nonzero form in some cases, so even this
simplistic test triggers many times during a clang bootstrap and
projects/test-suite run.  It looks like there are still cases where we
need to more strongly prefer branches on nonzero though.  E.g. I saw a
case where a loop that started out with a check for 0 ended up with a
check for -1.  I'll try to look at that sometime.

I ended up adding the Reference class because MachineInstr::readsRegister()
doesn't check for subregisters (by design, as far as I could tell).

llvm-svn: 187723
2013-08-05 11:23:46 +00:00
Richard Sandiford b49a3ab262 [SystemZ] Use LOAD AND TEST to eliminate comparisons against zero
llvm-svn: 187720
2013-08-05 11:03:20 +00:00
Richard Sandiford fd7f4ae6d4 [SystemZ] Reuse CC results for integer comparisons with zero
This also fixes a bug in the predication of LR to LOCR: I'd forgotten
that with these in-place instruction builds, the implicit operands need
to be added manually.  I think this was latent until now, but is tested
by int-cmp-45.c.  It also adds a CC valid mask to STOC, again tested by
int-cmp-45.c.

llvm-svn: 187573
2013-08-01 10:39:40 +00:00
Richard Sandiford 3d768e334b [SystemZ] Be more careful about inverting CC masks
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken.  We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities.  For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2.  If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3.  Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.

Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll.  Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.

The patch also makes it easier to reuse CC results from other instructions.

llvm-svn: 187495
2013-07-31 12:30:20 +00:00
Richard Sandiford 8a757bba10 [SystemZ] Move compare-and-branch generation even later
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare).  It turns out that even
this is a bit too early.  Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed.  They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).

Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch.  This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.

I've added a test for the AnalyzeBranch problem.  A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.

llvm-svn: 187494
2013-07-31 12:11:07 +00:00
Richard Sandiford 6a06ba36ba [SystemZ] Postpone NI->RISBG conversion to convertToThreeAddress()
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source.  I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead.  The AND IMMEDIATE form is shorter
and is less likely to be cracked.

This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register.  The patch uses the z196 instruction RISBLG for this instead.

This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now.  Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.

llvm-svn: 187492
2013-07-31 11:36:35 +00:00
Richard Sandiford c3f85d73ab [SystemZ] Rework compare and branch support
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them.  This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value.  This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.

No behavioral change intended, but it paves the way for a later patch.

llvm-svn: 187116
2013-07-25 09:34:38 +00:00
Richard Sandiford f2404164ba [SystemZ] Add LOCR and LOCGR
llvm-svn: 187113
2013-07-25 09:11:15 +00:00
Richard Sandiford ff6c5a5609 [SystemZ] Use SLLK, SRLK and SRAK for codegen
This patch uses the instructions added in r186680 for codegen.

llvm-svn: 186681
2013-07-19 16:12:08 +00:00
Richard Sandiford 3f0edc2903 [SystemZ] Improve spilling of LGDR and LDGR
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.

llvm-svn: 186147
2013-07-12 08:37:17 +00:00
Richard Sandiford c40f27b52d [SystemZ] Remove no-op MVCs
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring.  Extend it to cope with single instructions
that copy from one frame index to another.

The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.

llvm-svn: 185705
2013-07-05 14:38:48 +00:00
Richard Sandiford 1ca6deaeb7 [SystemZ] Remove redundant frame MMOs
This fixes foldMemoryOperandImpl() so that it doesn't create duplicated
frame MMOs.  I hadn't realized when writing r185434 that it was the caller's
responsibility to add these.

No behavioural change intended.

llvm-svn: 185704
2013-07-05 14:31:24 +00:00
Richard Sandiford 8976ea72ab [SystemZ] Enable the use of MVC for frame-to-frame spills
...now that the problem that prompted the restriction has been fixed.

The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.

llvm-svn: 185701
2013-07-05 14:02:01 +00:00
Richard Sandiford ed1fab6b5b [SystemZ] Fold more spills
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>.  Use it to cut down on the number of spill loads.

Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.

This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch.  Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).

llvm-svn: 185529
2013-07-03 10:10:02 +00:00
NAKAMURA Takumi ddcba56281 SystemZInstrInfo.cpp: Tweak an assertion. [-Wunused-variable]
llvm-svn: 185499
2013-07-03 02:20:49 +00:00
Benjamin Kramer 421c8fb2ce SystemZ: Fold variable into assertion.
llvm-svn: 185475
2013-07-02 21:17:31 +00:00
Richard Sandiford f6bae1e434 [SystemZ] Use MVC to spill loads and stores
Try to use MVC when spilling the destination of a simple load or the source
of a simple store.  As explained in the comment, this doesn't yet handle
the case where the load or store location is also a frame index, since
that could lead to two simultaneous scavenger spills, something the
backend can't handle yet.  spill-02.py tests that this restriction kicks in,
but unfortunately I've not yet found a case that would fail without it.
The volatile trick I used for other scavenger tests doesn't work here
because we can't use MVC for volatile accesses anyway.

I'm planning on relaxing the restriction later, hopefully with a test
that does trigger the problem...

Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
classified as SimpleBDX{Load,Store}.  It wouldn't be easy to test for
that bug separately, which is why I didn't split out the fix as a
separate patch.

llvm-svn: 185434
2013-07-02 15:28:56 +00:00
Bill Wendling 637d97dd51 Don't cache the instruction and register info from the TargetMachine, because
the internals of TargetMachine could change.

No functionality change intended.

llvm-svn: 183567
2013-06-07 20:42:15 +00:00
Richard Sandiford e1d9f00f09 [SystemZ] Immediate compare-and-branch support
This patch adds support for the CIJ and CGIJ instructions.

llvm-svn: 182846
2013-05-29 11:58:52 +00:00
Richard Sandiford 0fb90ab0cb [SystemZ] Register compare-and-branch support
This patch adds support for the CRJ and CGRJ instructions.  Support for
the immediate forms will be a separate patch.

The architecture has a large number of comparison instructions.  I think
it's generally better to concentrate on using the "best" comparison
instruction first and foremost, then only use something like CRJ if
CR really was the natual choice of comparison instruction.  The patch
therefore opportunistically converts separate CR and BRC instructions
into a single CRJ while emitting instructions in ISelLowering.

llvm-svn: 182764
2013-05-28 10:41:11 +00:00
Richard Sandiford 53c9efd9c1 [SystemZ] Tweak SystemZInstrInfo::isBranch() interface
This is needed for the upcoming compare-and-branch patch.  No functional
change intended.

llvm-svn: 182762
2013-05-28 10:13:54 +00:00
Richard Sandiford 312425f32d [SystemZ] Add long branch pass
Before this change, the SystemZ backend would use BRCL for all branches
and only consider shortening them to BRC when generating an object file.
E.g. a branch on equal would use the JGE alias of BRCL in assembly output,
but might be shortened to the JE alias of BRC in ELF output.  This was
a useful first step, but it had two problems:

(1) The z assembler isn't traditionally supposed to perform branch shortening
    or branch relaxation.  We followed this rule by not relaxing branches
    in assembler input, but that meant that generating assembly code and
    then assembling it would not produce the same result as going directly
    to object code; the former would give long branches everywhere, whereas
    the latter would use short branches where possible.

(2) Other useful branches, like COMPARE AND BRANCH, do not have long forms.
    We would need to do something else before supporting them.

    (Although COMPARE AND BRANCH does not change the condition codes,
    the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction
    during codegen, so that we can safely lower it to a separate compare
    and long branch where necessary.  This is not a valid transformation
    for the assembler proper to make.)

This patch therefore moves branch relaxation to a pre-emit pass.
For now, calls are still shortened from BRASL to BRAS by the assembler,
although this too is not really the traditional behaviour.

The first test takes about 1.5s to run, and there are likely to be
more tests in this vein once further branch types are added.  The feeling
on IRC was that 1.5s is a bit much for a single test, so I've restricted
it to SystemZ hosts for now.

The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests.
A later patch will remove the {{g}}s from that directory.

llvm-svn: 182274
2013-05-20 14:23:08 +00:00
Ulrich Weigand 5f613dfd1f [SystemZ] Add back end
This adds the actual lib/Target/SystemZ target files necessary to
implement the SystemZ target.  Note that at this point, the target
cannot yet be built since the configure bits are missing.  Those
will be provided shortly by a follow-on patch.

This version of the patch incorporates feedback from reviews by
Chris Lattner and Anton Korobeynikov.  Thanks to all reviewers!

Patch by Richard Sandiford.

llvm-svn: 181203
2013-05-06 16:15:19 +00:00
Dan Gohman dfc96aea90 Remove the SystemZ backend.
llvm-svn: 142878
2011-10-24 23:48:32 +00:00
Evan Cheng 2bb4035707 Move TargetRegistry and TargetSelect from Target to Support where they belong.
These are strictly utilities for registering targets and components.

llvm-svn: 138450
2011-08-24 18:08:43 +00:00
Evan Cheng bc153d49b7 Next round of MC refactoring. This patch factor MC table instantiations, MC
registeration and creation code into XXXMCDesc libraries.

llvm-svn: 135184
2011-07-14 20:59:42 +00:00
Evan Cheng c5e6d2f519 - Eliminate MCCodeEmitter's dependency on TargetMachine. It now uses MCInstrInfo
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
  detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
  MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
  MCSubtargetInfo so MC code emitter can do the right thing.

llvm-svn: 134884
2011-07-11 03:57:24 +00:00
Evan Cheng 703a0fbf39 Hide the call to InitMCInstrInfo into tblgen generated ctor.
llvm-svn: 134244
2011-07-01 17:57:27 +00:00
Evan Cheng 194c3dc01f Move CallFrameSetupOpcode and CallFrameDestroyOpcode to TargetInstrInfo.
llvm-svn: 134030
2011-06-28 21:14:33 +00:00
Evan Cheng 1e210d08d8 Merge XXXGenRegisterNames.inc into XXXGenRegisterInfo.inc
llvm-svn: 134024
2011-06-28 20:07:07 +00:00