llvm-project

Commit Graph

Author	SHA1	Message	Date
Duncan P. N. Exon Smith	be7ea19b58	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Hal Finkel	3b48d08f54	Reenable use of TBAA during CodeGen We had disabled use of TBAA during CodeGen (even when otherwise using AA) because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to miss basic type punning that it should catch (and, thus, we'd fail to override TBAA when we should). However, when AA is in use during CodeGen, CGP now uses normal GEPs and bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a result, BasicAA should be able to make us do the right thing in the face of type-punning, and it seems safe to enable use of TBAA again. self-hosting seems fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle. Note: We still don't update TBAA when merging stack slots, although because BasicAA should now catch all such cases, this is no longer a blocking issue. Nevertheless, I plan to commit code to deal with this properly in the near future. llvm-svn: 206093	2014-04-12 01:26:00 +00:00
Richard Sandiford	dc6c2c953d	[SystemZ] Add support for z196 float<->unsigned conversions These complement the older float<->signed instructions. llvm-svn: 204451	2014-03-21 10:56:30 +00:00
Tim Northover	e94a518a22	IR: add a second ordering operand to cmpxhg for failure The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559	2014-03-11 10:48:52 +00:00
Daniel Sanders	753e17629d	Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Changes since review (and last commit attempt): - Fixed test failures that were missed due to configuration of local build. (fixes crash.ll and a couple others). - Fixed tests that happened to pass because the local build was on X86 (should fix 2007-12-17-InvokeAsm.ll) - mature-mc-support.ll's should no longer require all targets to be compiled. (should fix ARM and PPC buildbots) - Object output (-filetype=obj and similar) now forces the integrated assembler to be enabled regardless of default setting or -no-integrated-as. (should fix SystemZ buildbots) Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201333	2014-02-13 14:44:26 +00:00
Hal Finkel	93d8f59877	XFAIL test/CodeGen/SystemZ/alias-01.ll which requires CodeGen TBAA llvm-svn: 200094	2014-01-25 19:31:44 +00:00
Alp Toker	cb40291100	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. llvm-svn: 200018	2014-01-24 17:20:08 +00:00
Richard Sandiford	36b376914d	[SystemZ] Flesh out stackrestore test (frame-11.ll) ...so that it does something vaguely sensible. llvm-svn: 199117	2014-01-13 15:44:44 +00:00
Richard Sandiford	64c0c4c015	[SystemZ] Improve risbg-01.ll test The old mask in f24 wasn't well chosen because the lshr would always be zero. CodeGen didn't detect this but InstCombine would. The new mask ensures that both shifts are needed. f26 is specifically testing for a wrap-around mask. The AND can be applied to just the shift left, either before or after the shift. Again, CodeGen kept it in the original form but InstCombine would mask after the shift instead. The exact choice of NILF isn't important for the test so I just dropped it and kept the rotate. llvm-svn: 199115	2014-01-13 15:40:25 +00:00
Richard Sandiford	32379b8141	[SystemZ] Optimize (sext (ashr (shl ...), ...)) ...into (ashr (shl (anyext X), ...), ...), which requires one fewer instruction. The (anyext X) can sometimes be simplified too. I didn't do this in DAGCombiner because widening shifts isn't a win on all targets. llvm-svn: 199114	2014-01-13 15:17:53 +00:00
Richard Sandiford	3875cb60f3	[SystemZ] Fix RNSBG bug introduced by r197802 The zext handling added in r197802 wasn't right for RNSBG. This patch restricts it to ROSBG, RXSBG and RISBG. (The tests for RISBG were added in r197802 since RISBG was the motivating example.) llvm-svn: 198862	2014-01-09 11:28:53 +00:00
Richard Sandiford	15cfc1c33c	Handle masked rotate amounts At the moment we expect rotates to have the form: (or (shl X, Y), (shr X, Z)) where Y == bitsize(X) - Z or Z == bitsize(X) - Y. This form means that the (or ...) is undefined for Y == 0 or Z == 0. This undefinedness can be avoided by using Y == (C * bitsize(X) - Z) & (bitsize(X) - 1) or Z == (C * bitsize(X) - Y) & (bitsize(X) - 1) for any integer C (including 0, the most natural choice). llvm-svn: 198861	2014-01-09 10:56:42 +00:00
Richard Sandiford	0f264db3c6	Match the InstCombine form of rotates by X+C InstCombine converts (sub 32, (add X, C)) into (sub 32-C, X), so a rotate left of a 32-bit Y by X+C could appear as either: (or (shl Y, (add X, C)), (shr Y, (sub 32, (add X, C)))) without InstCombine or: (or (shl Y, (add X, C)), (shr Y, (sub 32-C, X))) with it. We already matched the first form. This patch handles the second too. llvm-svn: 198860	2014-01-09 10:49:40 +00:00
Richard Sandiford	41350a52ca	[SystemZ] Use interlocked-access 1 instructions for CodeGen ...namely LOAD AND ADD, LOAD AND AND, LOAD AND OR and LOAD AND EXCLUSIVE OR. LOAD AND ADD LOGICAL isn't really separately useful for LLVM. I'll look at adding reusing the CC results in new year. llvm-svn: 197985	2013-12-24 15:18:04 +00:00
Richard Sandiford	83a0b6abd0	[SystemZ] Optimize comparisons with truncated extended loads If the extension of a loaded value is compared against zero and used in other arithmetic, InstCombine will change the comparison to use the unextended load. It's also possible that the comparison could be against the unextended load from the outset. In DAG form this becomes a truncation of an extending load. We want to strip the truncation if possible so that we can use load-and-test instructions. llvm-svn: 197804	2013-12-20 11:56:02 +00:00
Richard Sandiford	220ee49bce	[SystemZ] Extend RISBG optimization The handling of ANY_EXTEND and ZERO_EXTEND was too strict. In this context we can treat ZERO_EXTEND in much the same way as an AND and then also handle outermost ZERO_EXTENDs. I couldn't find a test that benefited from the ANY_EXTEND change, but it's more obvious to write it this way once SIGN_EXTEND and ZERO_EXTEND are handled differently. llvm-svn: 197802	2013-12-20 11:49:48 +00:00
Andrew Trick	83a71c076c	Revert "Add -mcpu=z10 to SystemZ tests." This reverts commit r197466. The MachineCSE fix that required the -mcpu flag has been disabled until more work can be done to fix downstream issues. Adding -mcpu wasn't the right workaround anyway. llvm-svn: 197624	2013-12-18 23:04:37 +00:00
Andrew Trick	375ee3e16a	Add -mcpu=z10 to SystemZ tests. llvm-svn: 197466	2013-12-17 05:27:16 +00:00
Richard Sandiford	0847c450b6	[SystemZ] Optimize X [!=]= Y in cases where X - Y or Y - X is also computed In those cases it's better to compare the result of the subtraction against zero. llvm-svn: 197239	2013-12-13 15:50:30 +00:00
Richard Sandiford	c3dc44781b	[SystemZ] Make more use of TMHH This originally came about after noticing that InstCombine turns some of the TMHH (icmp (and...), ...) tests into plain comparisons. Since there is no instruction to compare with a 64-bit immediate, TMHH is generally better than an ordered comparison for the cases that it can handle. llvm-svn: 197238	2013-12-13 15:46:55 +00:00
Richard Sandiford	57485472e2	[SystemZ] Extend integer absolute selection This patch makes more use of LPGFR and LNGFR. It builds on top of the LTGFR selection from r197234. Most of the tests are motivated by what InstCombine would produce. llvm-svn: 197236	2013-12-13 15:35:00 +00:00
Richard Sandiford	bd2f0e9cd0	[SystemZ] Make more use of LTGFR InstCombine turns (sext (trunc)) into (ashr (shl)), then converts any comparison of the ashr against zero into a comparison of the shl against zero. This makes sense in itself, but we want to undo it for z, since the sign- extension instruction has a CC-setting form. I've included tests for both the original and InstCombined variants, but the former already worked. The patch fixes the latter. llvm-svn: 197234	2013-12-13 15:07:39 +00:00
Richard Sandiford	73170f8488	[SystemZ] Optimize fcmp X, 0 in cases where X is also negated In such cases it's often better to test the result of the negation instead, since the negation also sets CC. llvm-svn: 197032	2013-12-11 11:45:08 +00:00
Richard Sandiford	d1093636cc	Extend (truncate (load)) folding DAGCombiner could fold (truncate (load)) -> smaller load if the original load was the width of the truncation result or wider. This patch extends it to handle cases where the original load was narrower (and so the extension type stays the same). llvm-svn: 197030	2013-12-11 11:37:27 +00:00
Richard Sandiford	bef3d7af2b	Add TargetLowering::prepareVolatileOrAtomicLoad One unusual feature of the z architecture is that the result of a previous load can be reused indefinitely for subsequent loads, even if a cache-coherent store to that location is performed by another CPU. A special serializing instruction must be used if you want to force a load to be reattempted. Since volatile loads are not supposed to be omitted in this way, we should insert a serializing instruction before each such load. The same goes for atomic loads. The patch implements this at the IR->DAG boundary, in a similar way to atomic fences. It is a no-op for targets other than SystemZ. llvm-svn: 196906	2013-12-10 10:49:34 +00:00
Richard Sandiford	9afe613d12	Add TargetLowering::prepareVolatileOrAtomicLoad One unusual feature of the z architecture is that the result of a previous load can be reused indefinitely for subsequent loads, even if a cache-coherent store to that location is performed by another CPU. A special serializing instruction must be used if you want to force a load to be reattempted. Since volatile loads are not supposed to be omitted in this way, we should insert a serializing instruction before each such load. The same goes for atomic loads. The patch implements this at the IR->DAG boundary, in a similar way to atomic fences. It is a no-op for targets other than SystemZ. llvm-svn: 196905	2013-12-10 10:36:34 +00:00
Richard Sandiford	198ddf83c1	[SystemZ] Use LOAD AND TEST for comparisons with -0 ...since it os equivalent to comparison with +0. llvm-svn: 196580	2013-12-06 09:59:12 +00:00
Richard Sandiford	7b4118a0fc	[SystemZ] Extend the use of C(L)GFR instcombine prefers to put extended operands first, so this patch handles that case for C(L)GFR. llvm-svn: 196579	2013-12-06 09:56:50 +00:00
Richard Sandiford	48ef6abddc	[SystemZ] Optimize selects between 0 and -1 Since z has no setcc instruction as such, the choice of setBooleanContents is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent, so we produced a branch-free form when selecting between 0 and 1, but not when selecting between 0 and -1. This patch handles the latter case too. At some point I'd like to measure whether it's better to use conditional moves for constant selects on z196, but that's future work. llvm-svn: 196578	2013-12-06 09:53:09 +00:00
Richard Sandiford	ccc2a7c1a0	[SystemZ] Fix choice of known-zero mask in insertion optimization The backend converts 64-bit ORs into subreg moves if the upper 32 bits of one operand and the low 32 bits of the other are known to be zero. It then tries to peel away redundant ANDs from the upper 32 bits. Since AND masks are canonicalized to exclude known-zero bits, the test ORs the mask and the known-zero bits together before checking for redundancy. The problem was that it was using the wrong node when checking for known-zero bits, so could drop ANDs that were still needed. llvm-svn: 196267	2013-12-03 11:01:54 +00:00
Richard Sandiford	dd7dd930d1	[SystemZ] Fix incorrect use of RISBG for a zero-extended right shift We would wrongly transform the testcase into the equivalent of an AND with 1. The problem was that, when testing whether the shifted-in bits of the right shift were significant, we used the width of the final zero-extended result rather than the width of the shifted value. llvm-svn: 195731	2013-11-26 10:53:16 +00:00
Richard Sandiford	f03789ca3f	[SystemZ] Fix TMHH and TMHL usage for z10 with -O0 I've no idea why I decided to handle TMxx differently from all the other high/low logic operations, but it was a stupid thing to do. The high registers aren't available as separate 32-bit registers on z10, so subreg_h32 can't be used on a GR64 there. I've normally been testing with z196 and with -O3 and so hadn't noticed this until now. llvm-svn: 195473	2013-11-22 17:28:28 +00:00
Richard Sandiford	f834ea19db	[SystemZ] Automatically detect zEC12 and z196 hosts As on other hosts, the CPU identification instruction is priveleged, so we need to look through /proc/cpuinfo. I copied the PowerPC way of handling "generic". Several tests were implicitly assuming z10 and so failed on z196. llvm-svn: 193742	2013-10-31 12:14:17 +00:00
Richard Sandiford	094e609716	[SystemZ] Set usaAA to true useAA significantly improves the handling of vector code that has TBAA information attached. It also helps other cases, as shown by the testsuite changes here. The only real downside I've seen is that it interferes with MergeConsecutiveStores. The problem is that that optimization works top down, starting at the first store in the chain, and looks for cases where the chain result is only used by a single related store. These related stores don't alias, so useAA will have rewritten all the later stores to use a different chain input (typically the same one as the first store). I think the advantages outweigh the disadvantages though, so for now I've just disabled alias analysis for the unaligned-01.ll test. llvm-svn: 193521	2013-10-28 13:53:37 +00:00
Richard Sandiford	981fdeb477	[DAGCombiner] Respect volatility when checking for aliases Making useAA() default to true for SystemZ showed that the combiner alias analysis wasn't handling volatile accesses. This hit many of the SystemZ tests, but I arbitrarily picked one for the purpose of this patch. llvm-svn: 193518	2013-10-28 12:00:00 +00:00
Richard Sandiford	39c1ce4dc1	Keep TBAA info when rewriting SelectionDAG loads and stores Most SelectionDAG code drops the TBAA info when creating a new form of a load and store (e.g. during legalization, or when converting a plain load to an extending one). This patch tries to catch all cases where the TBAA information can legitimately be carried over. The patch adds alternative forms of getLoad() and getExtLoad() that take a MachineMemOperand instead of individual fields. (The corresponding getTruncStore() already exists.) The idea is to use the MachineMemOperand forms when all fields are carried over (size, pointer info, isVolatile, isNonTemporal, alignment and TBAA info). If some adjustment is being made, e.g. to narrow the load, then we still pass the individual fields but also pass the TBAA info. llvm-svn: 193517	2013-10-28 11:17:59 +00:00
Richard Sandiford	95f7ba988b	Replace sra with srl if a single sign bit is required E.g. (and (sra (i32 x) 31) 2) -> (and (srl (i32 x) 30) 2). llvm-svn: 192884	2013-10-17 11:16:57 +00:00
Richard Sandiford	3e382972d9	[SystemZ] Handle extensions in RxSBG optimizations The input to an RxSBG operation can be narrower as long as the upper bits are don't care. This fixes a FIXME added in r192783. llvm-svn: 192790	2013-10-16 13:35:13 +00:00
Richard Sandiford	f722a8e30e	[SystemZ] Improve handling of SETCC We previously used the default expansion to SELECT_CC, which in turn would expand to "LHI; BRC; LHI". In most cases it's better to use an IPM-based sequence instead. llvm-svn: 192784	2013-10-16 11:10:55 +00:00
Richard Sandiford	374a0e50c4	Handle (shl (anyext (shr ...))) in SimpilfyDemandedBits This is really an extension of the current (shl (shr ...)) -> shl optimization. The main difference is that certain upper bits must also not be demanded. The motivating examples are the first two in the testcase, which occur in llvmpipe output. llvm-svn: 192783	2013-10-16 10:26:19 +00:00
Richard Sandiford	6af6ff1e15	[SystemZ] Use A(G)SI when spilling the target of a constant addition llvm-svn: 192681	2013-10-15 08:42:59 +00:00
Richard Sandiford	b63e300b67	[SystemZ] Add comparisons of high words and memory llvm-svn: 191777	2013-10-01 15:00:44 +00:00
Richard Sandiford	a9ac0e0f75	[SystemZ] Add comparisons of large immediates using high words There are no corresponding patterns for small immediates because they would prevent the use of fused compare-and-branch instructions. llvm-svn: 191775	2013-10-01 14:56:23 +00:00
Richard Sandiford	42a694f44e	[SystemZ] Add immediate addition involving high words llvm-svn: 191774	2013-10-01 14:53:46 +00:00
Richard Sandiford	2cac763544	[SystemZ] Extend test-under-mask support to high GR32s llvm-svn: 191773	2013-10-01 14:41:52 +00:00
Richard Sandiford	3ad5a15b72	[SystemZ] Extend 32-bit RISBG optimizations to high words This involves using RISB[LH]G, whereas the equivalent z10 optimization uses RISBG. llvm-svn: 191770	2013-10-01 14:36:20 +00:00
Richard Sandiford	2896d044bd	[SystemZ] Extend pseudo conditional 8- and 16-bit stores to high words As the comment says, we always want to use STOC for 32-bit stores. llvm-svn: 191767	2013-10-01 14:33:55 +00:00
Richard Sandiford	96f013b827	[SystemZ] Add test missing from r191764. llvm-svn: 191765	2013-10-01 14:31:50 +00:00
Richard Sandiford	7028428c2c	[SystemZ] Allow integer AND involving high words llvm-svn: 191762	2013-10-01 14:20:41 +00:00
Richard Sandiford	5718dacbdd	[SystemZ] Allow integer XOR involving high words llvm-svn: 191759	2013-10-01 14:08:44 +00:00
Richard Sandiford	6e96ac600f	[SystemZ] Allow integer OR involving high words llvm-svn: 191755	2013-10-01 13:22:41 +00:00
Richard Sandiford	1a56931b22	[SystemZ] Allow integer insertions with a high-word destination llvm-svn: 191753	2013-10-01 13:18:56 +00:00
Richard Sandiford	7c5c0eabc9	[SystemZ] Allow selects with a high-word destination llvm-svn: 191751	2013-10-01 13:10:16 +00:00
Richard Sandiford	012402346f	[SystemZ] Add patterns to load a constant into a high word (IIHF) Similar to low words, we can use the shorter LLIHL and LLIHH if it turns out that the other half of the GR64 isn't live. llvm-svn: 191750	2013-10-01 13:02:28 +00:00
Richard Sandiford	21235a256f	[SystemZ] Add register zero extensions involving at least one high word llvm-svn: 191746	2013-10-01 12:49:07 +00:00
Richard Sandiford	5469c39a26	[SystemZ] Add truncating high-word stores (STCH and STHH) llvm-svn: 191743	2013-10-01 12:22:49 +00:00
Richard Sandiford	0d46b1a30f	[SystemZ] Add zero-extending high-word loads (LLCH and LLHH) llvm-svn: 191742	2013-10-01 12:19:08 +00:00
Richard Sandiford	89e160d975	[SystemZ] Add sign-extending high-word loads (LBH and LHH) llvm-svn: 191740	2013-10-01 12:11:47 +00:00
Richard Sandiford	0755c93b0c	[SystemZ] Use upper words of GR64s for codegen This just adds the basics necessary for allocating the upper words to virtual registers (move, load and store). The move support is parameterised in a way that makes it easy to handle zero extensions, but the associated zero-extend patterns are added by a later patch. The easiest way of testing this seemed to be add a new "h" register constraint for high words. I don't expect the constraint to be useful in real inline asms, but it should work, so I didn't try to hide it behind an option. llvm-svn: 191739	2013-10-01 11:26:28 +00:00
Manman Ren	adf4cc171e	TBAA: update tbaa format from scalar format to struct-path aware format. llvm-svn: 191690	2013-09-30 18:17:55 +00:00
Manman Ren	0ed04fc9ab	TBAA: handle scalar TBAA format and struct-path aware TBAA format. Remove the command line argument "struct-path-tbaa" since we should not depend on command line argument to decide which format the IR file is using. Instead, we check the first operand of the tbaa tag node, if it is a MDNode, we treat it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA format. When clang starts to use struct-path aware TBAA format no matter whether struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support for scalar TBAA format can be dropped. Existing testing cases are updated to use the struct-path aware TBAA format. llvm-svn: 191538	2013-09-27 18:34:27 +00:00
Richard Sandiford	067817ee05	[SystemZ] Rein back the use of block operations The backend tries to use block operations like MVC, NC, OC and XC for simple scalar operations. For correctness reasons, it rejects any case in which the regions might partially overlap. However, for performance reasons, it should also reject cases where the regions might be equal, since the instruction might then not use the fast path. This fixes a performance regression seen in bzip2. We may want to limit the optimisation even more in future, or even remove it entirely, but I'll try with this for now. llvm-svn: 191525	2013-09-27 15:29:20 +00:00
Richard Sandiford	54b369166f	[SystemZ] Improve handling of PC-relative addresses The backend previously folded offsets into PC-relative addresses whereever possible. That's the right thing to do when the address can be used directly in a PC-relative memory reference (using things like LRL). But if we have a register-based memory reference and need to load the PC-relative address separately, it's better to use an anchor point that could be shared with other accesses to the same area of the variable. Fixes a FIXME. llvm-svn: 191524	2013-09-27 15:14:04 +00:00
Richard Sandiford	93183ee78c	[SystemZ] Add unsigned compare-and-branch instructions For some reason I never got around to adding these at the same time as the signed versions. No idea why. I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether it should just be replaced with an "is normal" flag. I'll leave that for later though. There are some boundary conditions that can be tweaked, such as preferring unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256", but again I'll leave those for a separate patch. llvm-svn: 190930	2013-09-18 09:56:40 +00:00
Richard Sandiford	109a7c6ff1	[SystemZ] Improve extload handling The port originally had special patterns for extload, mapping them to the same instructions as sextload. It seemed neater to have patterns that match "an extension that is allowed to be signed" and "an extension that is allowed to be unsigned". This was originally meant to be a clean-up, but it does improve the handling of promoted integers a little, as shown by args-06.ll. llvm-svn: 190777	2013-09-16 09:03:10 +00:00
Richard Sandiford	030c165710	[SystemZ] Try to fold shifts into TMxx E.g. "SRL %r2, 2; TMLL %r2, 1" => "TMLL %r2, 4". llvm-svn: 190672	2013-09-13 09:09:50 +00:00
Richard Sandiford	a9eb9972e4	[SystemZ] Add TM and TMY The main complication here is that TM and TMY (the memory forms) set CC differently from the register forms. When the tested bits contain some 0s and some 1s, the register forms set CC to 1 or 2 based on the value the uppermost bit. The memory forms instead set CC to 1 regardless of the uppermost bit. Until now, I've tried to make it so that a branch never tests for an impossible CC value. E.g. NR only sets CC to 0 or 1, so branches on the result will only test for 0 or 1. Originally I'd tried to do the same thing for TM and TMY by using custom matching code in ISelDAGToDAG. That ended up being very ugly though, and would have meant duplicating some of the chain checks that the common isel code does. I've therefore gone for the simpler alternative of adding an extra operand to the TM DAG opcode to say whether a memory form would be OK. This means that the inverse of a "TM;JE" is "TM;JNE" rather than the more precise "TM;JNLE", just like the inverse of "TMLL;JE" is "TMLL;JNE". I suppose that's arguably less confusing though... llvm-svn: 190400	2013-09-10 10:20:32 +00:00
Richard Sandiford	5bc670bb55	[SystemZ] Tweak integer comparison code The architecture has many comparison instructions, including some that extend one of the operands. The signed comparison instructions use sign extensions and the unsigned comparison instructions use zero extensions. In cases where we had a free choice between signed or unsigned comparisons, we were trying to decide at lowering time which would best fit the available instructions, taking things like extension type into account. The code to do that was getting increasingly hairy and was also making some bad decisions. E.g. when comparing the result of two LLCs, it is better to use CR rather than CLR, since CR can be fused with a branch while CLR can't. This patch removes the lowering code and instead adds an operand to integer comparisons to say whether signed comparison is required, whether unsigned comparison is required, or whether either is OK. We can then leave the choice of instruction up to the normal isel code. llvm-svn: 190138	2013-09-06 11:51:39 +00:00
Richard Sandiford	4943bc393a	[SystemZ] Use XC for a memset of 0 llvm-svn: 190130	2013-09-06 10:25:07 +00:00
Richard Sandiford	178273a174	[SystemZ] Add NC, OC and XC For now these are just used to handle scalar ANDs, ORs and XORs in which all operands are memory. llvm-svn: 190041	2013-09-05 10:36:45 +00:00
Richard Sandiford	113c870397	[SystemZ] Add support for TMHH, TMHL, TMLH and TMLL For now this just handles simple comparisons of an ANDed value with zero. The CC value provides enough information to do any comparison for a 2-bit mask, and some nonzero comparisons with more populated masks, but that's all future work. llvm-svn: 189819	2013-09-03 15:38:35 +00:00
Richard Sandiford	35b9be298a	[SystemZ] Add support for TMHH, TMHL, TMLH and TMLL For now just handles simple comparisons of an ANDed value with zero. The CC value provides enough information to do any comparison for a 2-bit mask, and some nonzero comparisons with more populated masks, but that's all future work. llvm-svn: 189469	2013-08-28 10:31:43 +00:00
Richard Sandiford	be133a8757	[SystemZ] Extend memcmp support to all constant lengths This uses the infrastructure added for memcpy and memmove in r189331. llvm-svn: 189458	2013-08-28 09:01:51 +00:00
Richard Sandiford	5e318f0bfe	[SystemZ] Extend memcpy and memset support to all constant lengths Lengths up to a certain threshold (currently 6 * 256) use a series of MVCs. Lengths above that threshold use a loop to handle X*256 bytes followed by a single MVC to handle the excess (if any). This loop will also be needed in future when support for variable lengths is added. Because the same tablegen classes are used to define MVC and CLC, the patch also has the side-effect of defining a pseudo loop instruction for CLC. That instruction isn't used yet (and wouldn't be handled correctly if it were). I'm planning to use it soon though. llvm-svn: 189331	2013-08-27 09:54:29 +00:00
Richard Sandiford	03481334b5	[SystemZ] Add basic prefetch support Just the instructions and intrinsics for now. llvm-svn: 189100	2013-08-23 11:36:42 +00:00
Richard Sandiford	24e597b8c5	[SystemZ] Try reversing comparisons whose first operand is in memory This allows us to make more use of the many compare reg,mem instructions. llvm-svn: 189099	2013-08-23 11:27:19 +00:00
Richard Sandiford	a481f58542	[SystemZ] Prefer LHI;ST... over LAY;MV... If we had a store of an integer to memory, and the integer and store size were suitable for a form of MV..., we used MV... no matter what. We could then have sequences like: lay %r2, 0(%r3,%r4) mvi 0(%r2), 4 In these cases it seems better to force the constant into a register and use a normal store: lhi %r2, 4 stc %r2, 0(%r3, %r4) since %r2 is more likely to be hoisted and is easier to rematerialize. llvm-svn: 189098	2013-08-23 11:18:53 +00:00
Richard Sandiford	37cd6cfba2	Turn MipsOptimizeMathLibCalls into a target-independent scalar transform ...so that it can be used for z too. Most of the code is the same. The only real change is to use TargetTransformInfo to test when a sqrt instruction is available. The pass is opt-in because at the moment it only handles sqrt. llvm-svn: 189097	2013-08-23 10:27:02 +00:00
Richard Sandiford	7d86e47d04	[SystemZ] Define remainig *MUL_LOHI patterns The initial port used MLG(R) for i64 UMUL_LOHI but left the other three combinations as not-legal-or-custom. Although 32x32->{32,32} multiplications exist, they're not as quick as doing a normal 64-bit multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI would be useful. There's also no direct instruction for i64 SMUL_LOHI, so it needs to be implemented in terms of UMUL_LOHI. However, not defining these patterns means that we don't convert division by a constant into multiplication, so this patch fills in the other cases. The new i64 SMUL_LOHI sequence is simpler than the one that we used previously for 64x64->128 multiplication, so int-mul-08.ll now tests the full sequence. llvm-svn: 188898	2013-08-21 09:34:56 +00:00
Richard Sandiford	af5f66ac9e	[SystemZ] Use FI[EDX]BRA for codegen llvm-svn: 188895	2013-08-21 09:04:20 +00:00
Richard Sandiford	6f6d55161b	[SystemZ] Use SRST to optimize memchr SystemZTargetLowering::emitStringWrapper() previously loaded the character into R0 before the loop and made R0 live on entry. I'd forgotten that allocatable registers weren't allowed to be live across blocks at this stage, and it confused LiveVariables enough to cause a miscompilation of f3 in memchr-02.ll. This patch instead loads R0 in the loop and leaves LICM to hoist it after RA. This is actually what I'd tried originally, but I went for the manual optimisation after noticing that R0 often wasn't being hoisted. This bug forced me to go back and look at why, now fixed as r188774. We should also try to optimize null checks so that they test the CC result of the SRST directly. The select between null and the SRST GPR result could then usually be deleted as dead. llvm-svn: 188779	2013-08-20 09:38:48 +00:00
Richard Sandiford	bdd81d76f8	Fix test typo and add usual "br %r14" test llvm-svn: 188775	2013-08-20 09:14:46 +00:00
Richard Sandiford	96aa93d5f1	Fix overly pessimistic shortcut in post-RA MachineLICM Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers and TermRegs. When it sees a definition of R it adds all aliases of R to the corresponding set, so that when it needs to test for membership it only needs to test a single register, rather than worrying about aliases there too. E.g. the final candidate loop just has: unsigned Def = Candidates[i].Def; if (!PhysRegClobbers.test(Def) && ...) { to test whether register Def is multiply defined. However, there was also a shortcut in ProcessMI to make sure we didn't add candidates if we already knew that they would fail the final test. This shortcut was more pessimistic than the final one because it checked whether _any alias_ of the defined register was multiply defined. This is too conservative for targets that define register pairs. E.g. on z, R0 and R1 are sometimes used as a pair, so there is a 128-bit register that aliases both R0 and R1. If a loop used R0 and R1 independently, and the definition of R0 came first, we would be able to hoist the R0 assignment (because that used the final test quoted above) but not the R1 assignment (because that meant we had two definitions of the paired R0/R1 register and would fail the shortcut in ProcessMI). This patch just uses the same check for the ProcessMI shortcut as we use in the final candidate loop. llvm-svn: 188774	2013-08-20 09:11:13 +00:00
Richard Sandiford	784a580312	[SystemZ] Add negative integer absolute (load negative) For now this matches the equivalent of (neg (abs ...)), which did hit a few times in projects/test-suite. We should probably also match cases where absolute-like selects are used with reversed arguments. llvm-svn: 188671	2013-08-19 12:56:58 +00:00
Richard Sandiford	4b89705490	[SystemZ] Add integer absolute (load positive) llvm-svn: 188670	2013-08-19 12:48:54 +00:00
Richard Sandiford	709bda66b9	[SystemZ] Add support for sibling calls This first cut is pretty conservative. The final argument register (R6) is call-saved, so we would need to make sure that the R6 argument to a sibling call is the same as the R6 argument to the calling function, which seems worth keeping as a separate patch. Saying that integer truncations are free means that we no longer use the extending instructions LGF and LLGF for spills in int-conv-09.ll and int-conv-10.ll. Instead we treat the registers as 64 bits wide and truncate them to 32-bits where necessary. I think it's unlikely we'd use LGF and LLGF for spills in other situations for the same reason, so I'm removing the tests rather than replacing them. The associated code is generic and applies to many more instructions than just LGF and LLGF, so there is no corresponding code removal. llvm-svn: 188669	2013-08-19 12:42:31 +00:00
Richard Sandiford	0dec06a28c	[SystemZ] Use SRST to implement strlen and strnlen It would also make sense to use it for memchr; I'm working on that now. llvm-svn: 188547	2013-08-16 11:41:43 +00:00
Richard Sandiford	bb83a50f57	[SystemZ] Use MVST to implement strcpy and stpcpy llvm-svn: 188546	2013-08-16 11:29:37 +00:00
Richard Sandiford	ca23271010	[SystemZ] Use CLST to implement strcmp llvm-svn: 188544	2013-08-16 11:21:54 +00:00
Richard Sandiford	e3827751e2	[SystemZ] Fix handling of 64-bit memcmp results Generalize r188163 to cope with return types other than MVT::i32, just as the existing visitMemCmpCall code did. I've split this out into a subroutine so that it can be used for other upcoming patches. I also noticed that I'd used the wrong API to record the out chain. It's a load that uses DAG.getRoot() rather than getRoot(), so the out chain should go on PendingLoads. I don't have a testcase for that because we don't do any interesting scheduling on z yet. llvm-svn: 188540	2013-08-16 10:55:47 +00:00
Richard Sandiford	a59012577c	[SystemZ] Fix sign of integer memcmp result r188163 used CLC to implement memcmp. Code that compares the result directly against zero can test the CC value produced by CLC, but code that needs an integer result must use IPM. The sequence I'd used was: ipm <reg> sll <reg>, 2 sra <reg>, 30 but I'd forgotten that this inverts the order, so that CC==1 ("less") becomes an integer greater than zero, and CC==2 ("greater") becomes an integer less than zero. This sequence should only be used if the CLC arguments are reversed to compensate. The problem then is that the branch condition must also be reversed when testing the CLC result directly. Rather than do that, I went for a different sequence that works with the natural CLC order: ipm <reg> srl <reg>, 28 rll <reg>, <reg>, 31 One advantage of this is that it doesn't clobber CC. A disadvantage is that any sign extension to 64 bits must be done separately, rather than being folded into the shifts. llvm-svn: 188538	2013-08-16 10:22:54 +00:00
Daniel Dunbar	9efbedfd35	[tests] Cleanup initialization of test suffixes. - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513	2013-08-16 00:37:11 +00:00
Richard Sandiford	564681c88d	[SystemZ] Use CLC and IPM to implement memcmp For now this is restricted to fixed-length comparisons with a length in the range [1, 256], as for memcpy() and MVC. llvm-svn: 188163	2013-08-12 10:28:10 +00:00
Richard Sandiford	0897fce2f4	[SystemZ] Optimize floating-point comparisons with zero This follows the same lines as the integer code. In the end it seemed easier to have a second 4-bit mask in TSFlags to specify the compare-like CC values. That eats one more TSFlags bit than adding a CCHasUnordered would have done, but it feels more concise. llvm-svn: 187883	2013-08-07 11:10:06 +00:00
Richard Sandiford	9f11bc1956	[SystemZ] Add floating-point load-and-test instructions These instructions can also be used as comparisons with zero. llvm-svn: 187882	2013-08-07 11:03:34 +00:00
Richard Sandiford	c212125d27	[SystemZ] Use BRCT and BRCTG to eliminate add-&-compare sequences This patch just uses a peephole test for "add; compare; branch" sequences within a single block. The IR optimizers already convert loops to decrement-and-branch-on-nonzero form in some cases, so even this simplistic test triggers many times during a clang bootstrap and projects/test-suite run. It looks like there are still cases where we need to more strongly prefer branches on nonzero though. E.g. I saw a case where a loop that started out with a check for 0 ended up with a check for -1. I'll try to look at that sometime. I ended up adding the Reference class because MachineInstr::readsRegister() doesn't check for subregisters (by design, as far as I could tell). llvm-svn: 187723	2013-08-05 11:23:46 +00:00
Richard Sandiford	b49a3ab262	[SystemZ] Use LOAD AND TEST to eliminate comparisons against zero llvm-svn: 187720	2013-08-05 11:03:20 +00:00
Richard Sandiford	fd7f4ae6d4	[SystemZ] Reuse CC results for integer comparisons with zero This also fixes a bug in the predication of LR to LOCR: I'd forgotten that with these in-place instruction builds, the implicit operands need to be added manually. I think this was latent until now, but is tested by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by int-cmp-45.c. llvm-svn: 187573	2013-08-01 10:39:40 +00:00
Richard Sandiford	a075708abe	[SystemZ] Prefer comparisons with zero Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own, but it exposes more opportunities for CC reuse (the next patch). llvm-svn: 187571	2013-08-01 10:29:45 +00:00
Richard Sandiford	791bea4182	[SystemZ] Implement isLegalAddressingMode() The loop optimizers were assuming that scales > 1 were OK. I think this is actually a bug in TargetLoweringBase::isLegalAddressingMode(), since it seems to be trying to reject anything that isn't r+i or r+r, but it has no default case for scales other than 0, 1 or 2. Implementing the hook for z means that z can no longer test any change there though. llvm-svn: 187497	2013-07-31 12:58:26 +00:00
Richard Sandiford	ee8343822e	[SystemZ] Be more careful about inverting CC masks (conditional loads) Extend r187495 to conditional loads. I split this out because the easiest way seemed to be to force a particular operand order in SystemZISelDAGToDAG.cpp. llvm-svn: 187496	2013-07-31 12:38:08 +00:00
Richard Sandiford	3d768e334b	[SystemZ] Be more careful about inverting CC masks System z branches have a mask to select which of the 4 CC values should cause the branch to be taken. We can invert a branch by inverting the mask. However, not all instructions can produce all 4 CC values, so inverting the branch like this can lead to some oddities. For example, integer comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater). If an integer EQ is reversed to NE before instruction selection, the branch will test for 1 or 2. If instead the branch is reversed after instruction selection (by inverting the mask), it will test for 1, 2 or 3. Both are correct, but the second isn't really canonical. This patch therefore keeps track of which CC values are possible and uses this when inverting a mask. Although this is mostly cosmestic, it fixes undefined behavior for the CIJNLH in branch-08.ll. Another fix would have been to mask out bit 0 when generating the fused compare and branch, but the point of this patch is that we shouldn't need to do that in the first place. The patch also makes it easier to reuse CC results from other instructions. llvm-svn: 187495	2013-07-31 12:30:20 +00:00
Richard Sandiford	8a757bba10	[SystemZ] Move compare-and-branch generation even later r187116 moved compare-and-branch generation from the instruction-selection pass to the peephole optimizer (via optimizeCompare). It turns out that even this is a bit too early. Fused compare-and-branch instructions don't interact well with predication, where a CC result is needed. They also make it harder to reuse the CC side-effects of earlier instructions (not yet implemented, but the subject of a later patch). Another problem was that the AnalyzeBranch family of routines weren't handling compares and branches, so we weren't able to reverse the fused form in cases where we would reverse a separate branch. This could have been fixed by extending AnalyzeBranch, but given the other problems, I've instead moved the fusing to the long-branch pass, which is also responsible for the opposite transformation: splitting out-of-range compares and branches into separate compares and long branches. I've added a test for the AnalyzeBranch problem. A test for the predication problem is included in the next patch, which fixes a bug in the choice of CC mask. llvm-svn: 187494	2013-07-31 12:11:07 +00:00
Richard Sandiford	6a06ba36ba	[SystemZ] Postpone NI->RISBG conversion to convertToThreeAddress() r186399 aggressively used the RISBG instruction for immediate ANDs, both because it can handle some values that AND IMMEDIATE can't, and because it allows the destination register to be different from the source. I realized later while implementing the distinct-ops support that it would be better to leave the choice up to convertToThreeAddress() instead. The AND IMMEDIATE form is shorter and is less likely to be cracked. This is a problem for 32-bit ANDs because we assume that all 32-bit operations will leave the high word untouched, whereas RISBG used in this way will either clear the high word or copy it from the source register. The patch uses the z196 instruction RISBLG for this instead. This means that z10 will be restricted to NILL, NILH and NILF for 32-bit ANDs, but I think that should be OK for now. Although we're using z10 as the base architecture, the optimization work is going to be focused more on z196 and zEC12. llvm-svn: 187492	2013-07-31 11:36:35 +00:00
Richard Sandiford	c3f85d73ab	[SystemZ] Rework compare and branch support Before the patch we took advantage of the fact that the compare and branch are glued together in the selection DAG and fused them together (where possible) while emitting them. This seemed to work well in practice. However, fusing the compare so early makes it harder to remove redundant compares in cases where CC already has a suitable value. This patch therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of functions instead. No behavioral change intended, but it paves the way for a later patch. llvm-svn: 187116	2013-07-25 09:34:38 +00:00
Richard Sandiford	f2404164ba	[SystemZ] Add LOCR and LOCGR llvm-svn: 187113	2013-07-25 09:11:15 +00:00
Richard Sandiford	09a8cf3604	[SystemZ] Add LOC and LOCG As with the stores, these instructions can trap when the condition is false, so they are only used for things like (cond ? x : *ptr). llvm-svn: 187112	2013-07-25 09:04:52 +00:00
Richard Sandiford	a68e6f5660	[SystemZ] Add STOC and STOCG These instructions are allowed to trap even if the condition is false, so for now they are only used for "ptr = (cond ? x : ptr)"-style constructs. llvm-svn: 187111	2013-07-25 08:57:02 +00:00
Richard Sandiford	dd170bd977	[SystemZ] Add tests for ALHSIK and ALGHSIK The insn definitions themselves crept into r186689, sorry. This should be the last of the distinct-ops instructions. llvm-svn: 186690	2013-07-19 16:44:32 +00:00
Richard Sandiford	fac8b10a84	[SystemZ] Add ALRK, AGLRK, SLRK and SGLRK Follows the same lines as r186686, but much more limited, since we only use ADD LOGICAL for multi-i64 additions. llvm-svn: 186689	2013-07-19 16:37:00 +00:00
Richard Sandiford	7d6a453623	[SystemZ] Add AHIK and AGHIK I did these as a separate patch because it uses a slightly different form of RIE layout. llvm-svn: 186687	2013-07-19 16:32:12 +00:00
Richard Sandiford	c575df6dcc	[SystemZ] Add ARK, AGRK, SRK and SGRK The testsuite changes follow the same lines as for r186683. llvm-svn: 186686	2013-07-19 16:26:39 +00:00
Richard Sandiford	c57e586792	[SystemZ] Add NGRK, OGRK and XGRK Like r186683, but for 64 bits. llvm-svn: 186685	2013-07-19 16:24:22 +00:00
Richard Sandiford	0175b4a353	[SystemZ] Add NRK, ORK and XRK The atomic tests assume the two-operand forms, so I've restricted them to z10. Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why using convertToThreeAddress() is better than exposing the three-operand forms first and then converting back to two operands where possible (which is what I'd originally tried). Using the three-operand form first stops us from taking advantage of NG, OG and XG for spills. llvm-svn: 186683	2013-07-19 16:21:55 +00:00
Richard Sandiford	ff6c5a5609	[SystemZ] Use SLLK, SRLK and SRAK for codegen This patch uses the instructions added in r186680 for codegen. llvm-svn: 186681	2013-07-19 16:12:08 +00:00
Richard Sandiford	5109321042	[SystemZ] Use RNSBG This should be the last of the R.SBG patches for now. llvm-svn: 186573	2013-07-18 10:40:35 +00:00
Richard Sandiford	297f7d2724	[SystemZ] Generalize RxSBG SRA case The original code only folded SRA into ROTATE ... SELECTED BITS if there was no outer shift. This patch splits out that check and generalises it slightly. The extra cases aren't really that interesting, but this is paving the way for RNSBG support. llvm-svn: 186571	2013-07-18 10:14:55 +00:00
Richard Sandiford	7878b852e6	[SystemZ] Use RXSBG Extend the previous R.SBG patches to handle XORs. llvm-svn: 186570	2013-07-18 10:06:15 +00:00
Richard Sandiford	885140c951	[SystemZ] Use ROSBG and non-zero form of RISBG for OR nodes llvm-svn: 186405	2013-07-16 11:55:57 +00:00
Richard Sandiford	82ec87dbdb	[SystemZ] Use RISBG for (shift (and ...)) Another patch in the series to make more use of R.SBG. This one extends r186072 and r186073 to handle cases where the AND is inside the shift. llvm-svn: 186399	2013-07-16 11:02:24 +00:00
Stephen Lin	d24ab20e9b	Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$.$$[A-Za-z0-9_-]$:$ $$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;$.$-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;$.$-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;$.$-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;$.*$-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280	2013-07-14 06:24:09 +00:00
Richard Sandiford	17276d3567	[SystemZ] Add test missing from r186148 Sigh, twice in two days sorry. One day I'll remember... llvm-svn: 186150	2013-07-12 09:20:14 +00:00
Richard Sandiford	6d4bd28322	[SystemZ] Optimize sign-extends of vector setccs Normal (sext (setcc ...)) sequences are optimised into (select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND. However, this is deliberately not done for vectors, and after vector type legalization we have (sext_inreg (setcc ...)) instead. I wondered about trying to extend DAGCombiner to handle this case too, but it seemed to be a loss on some other targets I tried, even those for which SETCC isn't "legal" and SELECT_CC is. llvm-svn: 186149	2013-07-12 09:17:10 +00:00
Richard Sandiford	3f0edc2903	[SystemZ] Improve spilling of LGDR and LDGR If the source of these instructions is spilled we should load the destination. If the destination is spilled we should store the source. llvm-svn: 186147	2013-07-12 08:37:17 +00:00
Richard Sandiford	4209e7f6c6	[SystemZ] Add testcase missing from r186073 llvm-svn: 186074	2013-07-11 09:10:38 +00:00
Richard Sandiford	ea9b6aa20b	[SystemZ] Use zeroing form of RISBG for shift-and-AND sequences Extend r186072 to handle shifts and ANDs. llvm-svn: 186073	2013-07-11 09:10:09 +00:00
Richard Sandiford	84f54a3bc9	[SystemZ] Use zeroing form of RISBG for some AND sequences RISBG can handle some ANDs for which no AND IMMEDIATE exists. It also acts as a three-operand AND for some cases where an AND IMMEDIATE could be used instead. It might be worth adding a pass to replace RISBG with AND IMMEDIATE in cases where the register operands end up being the same and where AND IMMEDIATE is smaller. llvm-svn: 186072	2013-07-11 08:59:12 +00:00
Richard Sandiford	9784649157	[SystemZ] Use MVC for simple load/store pairs Look for patterns of the form (store (load ...), ...) in which the two locations are known not to partially overlap. (Identical locations are OK.) These sequences are better implemented by MVC unless either the load or the store could use RELATIVE LONG instructions. The testcase showed that we weren't using LHRL and LGHRL for extload16, only sextloadi16. The patch fixes that too. llvm-svn: 185919	2013-07-09 09:46:39 +00:00
Richard Sandiford	47660c148c	[SystemZ] Use "STC;MVC" for memset Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet small enough for a single MVC. As with memcpy, I'm leaving longer cases till later. The number of tests might seem excessive, but f33 & f34 from memset-04.ll failed the first cut because I'd not added the "?:" on the calculation of Size1. llvm-svn: 185918	2013-07-09 09:32:42 +00:00
Richard Sandiford	d131ff8cf8	[SystemZ] Use MVC for memcpy Use MVC for memcpy in cases where a single MVC is enough. Using MVC is a win for longer copies too, but I'll leave that for later. llvm-svn: 185802	2013-07-08 09:35:23 +00:00
Richard Sandiford	c40f27b52d	[SystemZ] Remove no-op MVCs The stack coloring pass has code to delete stores and loads that become trivially dead after coloring. Extend it to cope with single instructions that copy from one frame index to another. The testcase happens to show an example of this kicking in at the moment. It did occur in Real Code too though. llvm-svn: 185705	2013-07-05 14:38:48 +00:00
Richard Sandiford	b5d9bd6f59	Fix double renaming bug in stack coloring pass The stack coloring pass renumbered frame indexes with a loop of the form: for each frame index FI for each instruction I that uses FI for each use of FI in I rename FI to FI' This caused problems if an instruction used two frame indexes F0 and F1 and if F0 was renamed to F1 and F1 to F2. The first time we visited the instruction we changed F0 to F1, then we changed both F1s to F2. In other words, the problem was that SSRefs recorded which instructions used an FI, but not which MachineOperands and MachineMemOperands within that instruction used it. This is easily fixed for MachineOperands by walking the instructions once and processing each operand in turn. There's already a loop to do that for dead store elimination, so it seemed more efficient to fuse the two at the block level. MachineMemOperands are more tricky because they can be shared between instructions. The patch handles them by making SSRefs an array of MachineMemOperands rather than an array of MachineInstrs. We might end up processing the same MachineMemOperand twice, but that's OK because we always know from the SSRefs index what the original frame index was. llvm-svn: 185703	2013-07-05 14:24:47 +00:00
Richard Sandiford	8976ea72ab	[SystemZ] Enable the use of MVC for frame-to-frame spills ...now that the problem that prompted the restriction has been fixed. The original spill-02.py was a compromise because at the time I couldn't find an example that actually failed without the two scavenging slots. The version included here did. llvm-svn: 185701	2013-07-05 14:02:01 +00:00
Richard Sandiford	23943229f6	[SystemZ] Allocate a second register scavenging slot This is another prerequisite for frame-to-frame MVC copies. I'll commit the patch that makes use of the slot separately. The downside of trying to test many corner cases with each of the available addressing modes is that a fair few tests need to account for the new frame layout. I do still think it's useful to have all these tests though, since it's something that wouldn't get much coverage otherwise. llvm-svn: 185698	2013-07-05 13:11:52 +00:00
Richard Sandiford	ed1fab6b5b	[SystemZ] Fold more spills Add a mapping from register-based <INSN>R instructions to the corresponding memory-based <INSN>. Use it to cut down on the number of spill loads. Some instructions extend their operands from smaller fields, so this required a new TSFlags field to say how big the unextended operand is. This optimisation doesn't trigger for C(G)R and CL(G)R because in practice we always combine those instructions with a branch. Adding a test for every other case probably seems excessive, but it did catch a missed optimisation for DSGF (fixed in r185435). llvm-svn: 185529	2013-07-03 10:10:02 +00:00
Richard Sandiford	e6e7885591	[SystemZ] Use DSGFR over DSGR in more cases Fixes some cases where we were using full 64-bit division for (sdiv i32, i32) and (sdiv i64, i32). The "32" in "SDIVREM32" just refers to the second operand. The first operand of all DIVREMs is a GR128. llvm-svn: 185435	2013-07-02 15:40:22 +00:00
Richard Sandiford	f6bae1e434	[SystemZ] Use MVC to spill loads and stores Try to use MVC when spilling the destination of a simple load or the source of a simple store. As explained in the comment, this doesn't yet handle the case where the load or store location is also a frame index, since that could lead to two simultaneous scavenger spills, something the backend can't handle yet. spill-02.py tests that this restriction kicks in, but unfortunately I've not yet found a case that would fail without it. The volatile trick I used for other scavenger tests doesn't work here because we can't use MVC for volatile accesses anyway. I'm planning on relaxing the restriction later, hopefully with a test that does trigger the problem... Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly classified as SimpleBDX{Load,Store}. It wouldn't be easy to test for that bug separately, which is why I didn't split out the fix as a separate patch. llvm-svn: 185434	2013-07-02 15:28:56 +00:00
Richard Sandiford	ec8693d5f3	[SystemZ] Fix some embarrassing test typos llvm-svn: 185070	2013-06-27 09:49:34 +00:00
Richard Sandiford	891a7e7454	[SystemZ] Allow LA and LARL to be rematerialized llvm-svn: 185069	2013-06-27 09:42:10 +00:00
Richard Sandiford	a57e13b670	[SystemZ] Allow immediate moves to be rematerialized llvm-svn: 185068	2013-06-27 09:38:48 +00:00
Richard Sandiford	b86a83488e	[SystemZ] Add conditional store patterns Add pseudo conditional store instructions, so that we use: branch foo: store foo: instead of: load branch foo: move foo: store z196 has real 32-bit and 64-bit conditional stores, but we don't use any z196 instructions yet. llvm-svn: 185065	2013-06-27 09:27:40 +00:00
Richard Sandiford	30efd87f6e	[SystemZ] Don't use LOAD and STORE REVERSED for volatile accesses Unlike most -- hopefully "all other", but I'm still checking -- memory instructions we support, LOAD REVERSED and STORE REVERSED may access the memory location several times. This means that they are not suitable for volatile loads and stores. This patch is a prerequisite for better atomic load and store support. The same principle applies there: almost all memory instructions we support are inherently atomic ("block concurrent"), but LOAD REVERSED and STORE REVERSED are exceptions. Other instructions continue to allow volatile operands. I will add positive "allows volatile" tests at the same time as the "allows atomic load or store" tests. llvm-svn: 183002	2013-05-31 13:25:22 +00:00
Richard Sandiford	46af5a2cdc	[SystemZ] Enable unaligned accesses The code to distinguish between unaligned and aligned addresses was already there, so this is mostly just a switch-on-and-test process. llvm-svn: 182920	2013-05-30 09:45:42 +00:00
Richard Sandiford	ba97c34bb6	[SystemZ] Two tests missing from previous commit llvm-svn: 182847	2013-05-29 11:59:26 +00:00
Richard Sandiford	e1d9f00f09	[SystemZ] Immediate compare-and-branch support This patch adds support for the CIJ and CGIJ instructions. llvm-svn: 182846	2013-05-29 11:58:52 +00:00
Richard Sandiford	0fb90ab0cb	[SystemZ] Register compare-and-branch support This patch adds support for the CRJ and CGRJ instructions. Support for the immediate forms will be a separate patch. The architecture has a large number of comparison instructions. I think it's generally better to concentrate on using the "best" comparison instruction first and foremost, then only use something like CRJ if CR really was the natual choice of comparison instruction. The patch therefore opportunistically converts separate CR and BRC instructions into a single CRJ while emitting instructions in ISelLowering. llvm-svn: 182764	2013-05-28 10:41:11 +00:00
Richard Sandiford	586f41777e	[SystemZ] Tighten branch tests After r182274, the branches in these tests must always be short. llvm-svn: 182358	2013-05-21 08:53:17 +00:00
Richard Sandiford	312425f32d	[SystemZ] Add long branch pass Before this change, the SystemZ backend would use BRCL for all branches and only consider shortening them to BRC when generating an object file. E.g. a branch on equal would use the JGE alias of BRCL in assembly output, but might be shortened to the JE alias of BRC in ELF output. This was a useful first step, but it had two problems: (1) The z assembler isn't traditionally supposed to perform branch shortening or branch relaxation. We followed this rule by not relaxing branches in assembler input, but that meant that generating assembly code and then assembling it would not produce the same result as going directly to object code; the former would give long branches everywhere, whereas the latter would use short branches where possible. (2) Other useful branches, like COMPARE AND BRANCH, do not have long forms. We would need to do something else before supporting them. (Although COMPARE AND BRANCH does not change the condition codes, the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction during codegen, so that we can safely lower it to a separate compare and long branch where necessary. This is not a valid transformation for the assembler proper to make.) This patch therefore moves branch relaxation to a pre-emit pass. For now, calls are still shortened from BRASL to BRAS by the assembler, although this too is not really the traditional behaviour. The first test takes about 1.5s to run, and there are likely to be more tests in this vein once further branch types are added. The feeling on IRC was that 1.5s is a bit much for a single test, so I've restricted it to SystemZ hosts for now. The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests. A later patch will remove the {{g}}s from that directory. llvm-svn: 182274	2013-05-20 14:23:08 +00:00

1 2 3 4 5 ...

325 Commits