llvm-project

Commit Graph

Author	SHA1	Message	Date
George Rimar	dcf59c5480	Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text" With compilation fix. Original commit message: D39788 added a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. This change does following two things on top: 1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs. 2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections. Differential revision: https://reviews.llvm.org/D46874 llvm-svn: 335336	2018-06-22 10:53:47 +00:00
George Rimar	6d448da1be	Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text" It broke bots. http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891 http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443 http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551 llvm-svn: 335333	2018-06-22 10:27:33 +00:00
George Rimar	e14485a0c6	[MC] - Add .stack_size sections into groups and link them with .text D39788 added a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. This change does following two things on top: 1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs. 2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections. Differential revision: https://reviews.llvm.org/D46874 llvm-svn: 335332	2018-06-22 10:10:53 +00:00
Karl-Johan Karlsson	abb11f805f	[BranchFolding] Fix live-in's when hoisting code Summary: When the branch folder hoist code into a predecessor it adjust live-in's in the blocks it hoist code from. However it fail to handle hoisted code that contain a defed register that originally is live-in in the block through a super register. This is fixed by replacing the live-in handling code with calls to utility functions in LivePhysRegs. Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47529 llvm-svn: 334163	2018-06-07 07:20:33 +00:00
Jonas Paulsson	307e782cbc	[SystemZ] Bugfix in combineSTORE(). Remember to check if store is truncating before calling combineTruncateExtract(). Review: Ulrich Weigand llvm-svn: 333262	2018-05-25 09:01:23 +00:00
Jonas Paulsson	7d484fae2b	[RegUsageInfoCollector] Bugfix for callee saved registers. Previously, this pass would look at the (static) set returned by getCallPreservedMask() and add those back as preserved in the case when isSafeForNoCSROpt() returns false. A problem is that a target may have to save some registers even when NoCSROpt takes place. For instance, on SystemZ, the return register is needed upon return from a function. Furthermore, getCallPreservedMask() only includes the registers that the target actually wishes to emit save/restore instructions for. This means that subregs and (fully saved) superregs are missing. This patch instead takes the (dynamic) set returned by target for the function from determineCalleeSaves() and then adds sub/super regs to build the set to be used when building the RegMask for the function. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D46315 llvm-svn: 333261	2018-05-25 08:42:02 +00:00
Jonas Paulsson	de54c058a6	[SystemZ] Fold AHIMux in foldMemoryOperandImpl. AHIMux can be folded the same way as AHI. Review: Ulrich Weigand llvm-svn: 332703	2018-05-18 11:54:04 +00:00
Jonas Paulsson	ebb1605bf3	[SystemZ] Bugfix for MVCLoop CC clobbering. MVCLoop clobbers CC (since it emits a compare/branch), but this was not modelled. Review: Ulrich Weigand llvm-svn: 331627	2018-05-07 10:48:43 +00:00
Jonas Paulsson	72fe760592	[RegUsageInfoCollector] Bugfix for handling of register aliases. Don't assume the alias of a defined reg is always already in the set. As the test case in https://bugs.llvm.org/show_bug.cgi?id=36587 discovered, it is wrong to assume that all the aliases of the defined register in the current function is already present in the UsedPhysRegsMask. This patch changes this so that any definition in the current function of a phys-reg always results in all its aliases inserted into the set of defined registers. Review: Quentin Colombet https://reviews.llvm.org/D45157 llvm-svn: 331509	2018-05-04 07:50:05 +00:00
Ulrich Weigand	c3ec80fea1	[SystemZ] Handle SADDO et.al. and ADD/SUBCARRY This provides an optimized implementation of SADDO/SSUBO/UADDO/USUBO as well as ADDCARRY/SUBCARRY on top of the new CC implementation. In particular, multi-word arithmetic now uses UADDO/ADDCARRY instead of the old ADDC/ADDE logic, which means we no longer need to use "glue" links for those instructions. This also allows making full use of the memory-based instructions like ALSI, which couldn't be recognized due to limitations in the DAG matcher previously. Also, the llvm.sadd.with.overflow et.al. intrinsincs now expand to directly using the ADD instructions and checking for a CC 3 result. llvm-svn: 331203	2018-04-30 17:54:28 +00:00
Ulrich Weigand	b32f3656d2	[SystemZ] Do not use glue to represent condition code dependencies Currently, an instruction setting the condition code is linked to the instruction using the condition code via a "glue" link in the SelectionDAG. This has a number of drawbacks; in particular, it means the same CC cannot be used by multiple users. It also makes it more difficult to efficiently implement SADDO et. al. This patch changes the back-end to represent CC dependencies as normal values during SelectionDAG matching, along the lines of how this is handled in the X86 back-end already. In addition to the core mechanics of updating all relevant patterns, this requires a number of additional changes: - We now need to be able to spill/restore a CC value into a GPR if necessary. This means providing a copyPhysReg implementation for moves involving CC, and defining getCrossCopyRegClass. - Since we still prefer to avoid such spills, we provide an override for IsProfitableToFold to avoid creating a merged LOAD / ICMP if this would result in multiple users of the CC. - combineCCMask no longer requires a single CC user, and no longer need to be careful about preventing invalid glue/chain cycles. - emitSelect needs to be more careful in marking CC live-in to the basic block it generates. Also, we can now optimize the case of multiple subsequent selects with the same condition just like X86 does. llvm-svn: 331202	2018-04-30 17:52:32 +00:00
Ulrich Weigand	fb56686cd3	[SystemZ] Improve handling of Select pseudo-instructions If we have LOCR instructions, select them directly from SelectionDAG instead of first going through a pseudo instruction and then using the custom inserter to emit the LOCR. Provide Select pseudo-instructions for VR32/VR64 if we have vector instructions, to avoid having to go through the first 16 FPRs unnecessarily. If we do not have LOCFHR, prefer using LOCR followed by a move over a conditional branch. llvm-svn: 331191	2018-04-30 15:49:27 +00:00
Jun Bum Lim	06073bfff7	[PostRASink]Add register dependency check for implicit operands Summary: This change extend the register dependency check for implicit operands in Copy instructions. Fixes PR36902. Reviewers: thegameg, sebpop, uweigand, jnspaulsson, gberry, mcrosier, qcolombet, MatzeB Reviewed By: thegameg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44958 llvm-svn: 330018	2018-04-13 14:23:09 +00:00
Zaara Syeda	935474fef5	[MachineLICM] Re-enable hoisting of constant stores This patch fixes an issue exposed on the SystemZ build bots when committing https://reviews.llvm.org/rL327856. The hoisting was temporarily disabled with an option. This patch now re-enables hoisting and checks that we only hoist a store instruction when all its operands are either constant caller preserved registers or immediates. Differential Revision: https://reviews.llvm.org/D45286 llvm-svn: 329577	2018-04-09 14:50:02 +00:00
Rafael Espindola	78fdca3cd5	Use local symbols for creating .stack-size. llvm-svn: 328581	2018-03-26 20:40:22 +00:00
Jonas Paulsson	8ad035d8e5	[SystemZ] Add "REQUIRES: asserts" to test case to fix build bots. llvm-svn: 327958	2018-03-20 08:29:19 +00:00
Jonas Paulsson	a6216ec4cc	[SystemZ] Bugfix of CC liveness in emitMemMemWrapper (CLC). If DoneMBB becomes empty it must have CC added to its live-in list, since it will fall-through into EndMBB. This happens when the CLC loop does the complete range. Review: Ulrich Weigand llvm-svn: 327834	2018-03-19 13:05:22 +00:00
Jonas Paulsson	dbcf1bf503	[SystemZ] Add 'REQUIRES: asserts' to test case using debug output. llvm-svn: 327766	2018-03-17 09:15:13 +00:00
Jonas Paulsson	138960770c	[SystemZ] computeKnownBitsForTargetNode() / ComputeNumSignBitsForTargetNode() Improve/implement these methods to improve DAG combining. This mainly concerns intrinsics. Some constant operands to SystemZISD nodes have been marked Opaque to avoid transforming back and forth between generic and target nodes infinitely. Review: Ulrich Weigand llvm-svn: 327765	2018-03-17 08:32:12 +00:00
Jonas Paulsson	e9f7fa83d5	[SelectionDAG] Handle big endian target BITCAST in computeKnownBits() The BITCAST handling in computeKnownBits() previously only worked for little endian. This patch reverses the iteration over elements for a big endian target which allows this to work in this case also. SystemZ test case. Review: Eli Friedman https://reviews.llvm.org/D44249 llvm-svn: 327764	2018-03-17 08:04:00 +00:00
Jonas Paulsson	5612bb292c	[CodeGenPrepare] Respect endianness in splitMergedValStore. splitMergedValStore will split a store into two if target prefers this, or if -force-split-store is passed. This patch adds the missing handling for endianness in this function along with a test case. Review: Eli Friedman https://reviews.llvm.org/D44396 llvm-svn: 327375	2018-03-13 08:36:20 +00:00
Ulrich Weigand	1785e244eb	[SystemZ] Fix test cases after r326613 I forgot to check in the updated test cases after the r326613 commit. llvm-svn: 326616	2018-03-02 21:22:42 +00:00
Ulrich Weigand	8b19be46c7	[SystemZ] Add support for anyregcc calling convention This adds back-end support for the anyregcc calling convention for use with patchpoints. Since all registers are considered call-saved with anyregcc (except for 0 and 1 which may still be clobbered by PLT stubs and the like), this required adding support for saving and restoring vector registers in prologue/epilogue code for the first time. This is not used by any other calling convention. llvm-svn: 326612	2018-03-02 20:40:11 +00:00
Ulrich Weigand	5eb64110d2	[SystemZ] Support stackmaps and patchpoints This adds back-end support for the @llvm.experimental.stackmap and @llvm.experimental.patchpoint intrinsics. llvm-svn: 326611	2018-03-02 20:39:30 +00:00
Ulrich Weigand	3206388870	[SystemZ] Fix common-code users of stack size On SystemZ we need to provide a register save area of 160 bytes to any called function. This size needs to be added when allocating stack in the function prologue. However, it was not accounted for as part of MachineFrameInfo::getStackSize(); instead the back-end used a private routine getAllocatedStackSize(). This is OK for code-gen purposes, but it breaks other users of the getStackSize() routine, in particular it breaks the recently- added -stack-size-section feature. Fix this by updating the main stack size tracked by common code (in emitPrologue) instead of using the private routine. No change in code generation intended. llvm-svn: 326610	2018-03-02 20:38:41 +00:00
Ulrich Weigand	18f6930fef	[SystemZ] Support vector registers in inline asm This adds support for specifying vector registers for use with inline asm statements, either via the 'v' constraint or by explicit register names (v0 ... v31). llvm-svn: 326609	2018-03-02 20:36:34 +00:00
Craig Topper	e7ca6f5456	[DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors. Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code. Differential Revision: https://reviews.llvm.org/D42679 llvm-svn: 326500	2018-03-01 22:32:25 +00:00
Geoff Berry	a2b9011290	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208	2018-02-27 16:59:10 +00:00
Jonas Paulsson	5b5e3d8f80	[SystemZ] Also update the CHECK line for VPDI llvm-svn: 325898	2018-02-23 13:22:46 +00:00
Jonas Paulsson	abc29dfa79	[SystemZ] Fix VPDI argument in test. To select element 1 from each half with VPDI, a constant of 5 should be used. llvm-svn: 325897	2018-02-23 13:20:57 +00:00
Quentin Colombet	48abac82b8	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r323991. This commit breaks target that don't model all the register constraints in TableGen. So far the workaround was to set the hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the cases. For instance, when mutating an instruction (like in the lowering of COPYs) the isRenamable flag is not properly updated. The same problem will happen when attaching machine operand from one instruction to another. Geoff Berry is working on a fix in https://reviews.llvm.org/D43042. llvm-svn: 325421	2018-02-17 03:05:33 +00:00
Jonas Paulsson	422dfbf7cc	[SelectionDAG] Consider endianness in scalarizeVectorStore(). When handling vectors with non byte-sized elements, reverse the order of the elements in the built integer if the target is Big-Endian. SystemZ tests updated. Review: Eli Friedman, Ulrich Weigand. https://reviews.llvm.org/D42786 llvm-svn: 324063	2018-02-02 08:48:02 +00:00
Jonas Paulsson	0e50b6ed80	[SystemZ] Update test case (NFC) test/CodeGen/SystemZ/vec-trunc-to-i1.ll was marked as a temporary FAIL when it was previously updated when it needed one more COPY. This was however wrong, since the loop body had been reduced significantly, and it was actually an improvement. Review: Ulrich Weigand. llvm-svn: 324060	2018-02-02 07:52:02 +00:00
Geoff Berry	94503c7bc3	[MachineCopyPropagation] Extend pass to do COPY source forwarding Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991	2018-02-01 18:54:01 +00:00
Nirav Dave	18f7f60e17	[SelectionDAG] Fix UpdateChains handling of TokenFactors Summary: In Instruction Selection UpdateChains replaces all matched Nodes' chain references including interior token factors and deletes them. This may allow nodes which depend on these interior nodes but are not part of the set of matched nodes to be left with a dangling dependence. Avoid this by doing the replacement for matched non-TokenFactor nodes. Fixes PR36164. Reviewers: jonpa, RKSimon, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42754 llvm-svn: 323977	2018-02-01 16:11:59 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Jonas Paulsson	cc5fe73669	[SystemZ] Check the bitwidth before calling isInt/isUInt. Since these methods will assert if the integer does not fit into 64 bits, it is necessary to do this check before calling them in supportedAddressingMode(). Review: Ulrich Weigand. llvm-svn: 323866	2018-01-31 12:41:25 +00:00
Jonas Paulsson	9cee52732f	Move new test from Generic to SystemZ. A few build bots failed with r323042 because they are not configured to build the SystemZ target. llvm-svn: 323044	2018-01-20 16:57:06 +00:00
Jonas Paulsson	7ad28863fb	[SelectionDAG] Fix codegen of vector stores with non byte-sized elements. This was completely broken, but hopefully fixed by this patch. In cases where it is needed, a vector with non byte-sized elements is stored by extracting, zero-extending, shift:ing and or:ing the elements into an integer of the same width as the vector, which is then stored. Review: Eli Friedman, Ulrich Weigand https://reviews.llvm.org/D42100#inline-369520 https://bugs.llvm.org/show_bug.cgi?id=35520 llvm-svn: 323042	2018-01-20 16:05:10 +00:00
Ulrich Weigand	426f6bef44	[SystemZ] Prefer LOCHI over generating IPM sequences On current machines we have load-on-condition instructions that can be used to directly implement the SETCC semantics. If we have those, it is always preferable to use them instead of generating the IPM sequence. llvm-svn: 322989	2018-01-19 20:56:04 +00:00
Ulrich Weigand	31112895d9	[SystemZ] Directly use CC result of compare-and-swap In order to implement a test whether a compare-and-swap succeeded, the SystemZ back-end currently emits a rather inefficient sequence of first converting the CC result into an integer, and then testing that integer against zero. This commit changes the back-end to simply directly test the CC value set by the compare-and-swap instruction. llvm-svn: 322988	2018-01-19 20:54:18 +00:00
Ulrich Weigand	849a59fd4b	[SystemZ] Rework IPM sequence generation The SystemZ back-end uses a sequence of IPM followed by arithmetic operations to implement the SETCC primitive. This is currently done early during SelectionDAG. This patch moves generating those sequences to much later in SelectionDAG (during PreprocessISelDAG). This doesn't change much in generated code by itself, but it allows further enhancements that will be checked-in as follow-on commits. llvm-svn: 322987	2018-01-19 20:52:04 +00:00
Ulrich Weigand	ac04d9b8e5	[SystemZ] Run branch-12.ll test only if long tests enabled This avoids excessive test run times e.g. with expensive checks enabled. llvm-svn: 322983	2018-01-19 19:51:38 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Jonas Paulsson	ef785694f2	[SystemZ] Handle BRCTH branches correctly in SystemZLongBranch.cpp. BRCTH is capable of a long branch which needs to be recognized during branch relaxation. This is done by checking for ExtraRelaxSize == 0. Review: Ulrich Weigand llvm-svn: 322688	2018-01-17 17:16:07 +00:00
Jonas Paulsson	776a81a483	[SystemZ] Check for legality before doing LOAD AND TEST transformations. Since a load and test instruction treat its operands as signed, it can only replace a logical compare for EQ/NE uses. Review: Ulrich Weigand https://bugs.llvm.org/show_bug.cgi?id=35662 llvm-svn: 322488	2018-01-15 15:41:26 +00:00
Jonas Paulsson	1a76f3a2c2	Temporarily revert "[SystemZ] Check for legality before doing LOAD AND TEST transformations." , due to test failures. llvm-svn: 322165	2018-01-10 10:05:55 +00:00
Jonas Paulsson	9222b91e24	[SelectionDAGBuilder] Chain prefetches less aggressively. Prefetches used to always be chained between any previous and following memory accesses. The problem with this was that later optimizations, such as folding of a load into the user instruction, got disrupted. This patch relaxes the chaining of prefetches in order to remedy this. Reveiw: Hal Finkel https://reviews.llvm.org/D38886 llvm-svn: 322163	2018-01-10 09:33:00 +00:00
Jonas Paulsson	d9dde1ac56	[SystemZ] Check for legality before doing LOAD AND TEST transformations. Since a load and test instruction treat its operands as signed, it can only replace a logical compare for EQ/NE uses. Review: Ulrich Weigand https://bugs.llvm.org/show_bug.cgi?id=35662 llvm-svn: 322161	2018-01-10 09:18:17 +00:00
Puyan Lotfi	fe6c9cbb24	[MIR] Repurposing '$' sigil used by external symbols. Replacing with '&'. Planning to add support for named vregs. This puts is in a conundrum since physregs are named as well. To rectify this we need to use a sigil other than '%' for physregs in MIR. We've settled on using '$' for physregs but first we must repurpose it from external symbols using it, which is what this commit is all about. We think '&' will have familiar semantics for C/C++ users. llvm-svn: 322146	2018-01-10 00:56:48 +00:00
Geoff Berry	60c431022e	[MachineOperand][MIR] Add isRenamable to MachineOperand. Summary: Add isRenamable() predicate to MachineOperand. This predicate can be used by machine passes after register allocation to determine whether it is safe to rename a given register operand. Register operands that aren't marked as renamable may be required to be assigned their current register to satisfy constraints that are not captured by the machine IR (e.g. ABI or ISA constraints). Reviewers: qcolombet, MatzeB, hfinkel Subscribers: nemanjai, mcrosier, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D39400 llvm-svn: 320503	2017-12-12 17:53:59 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Jonas Paulsson	19380bae05	[SystemZ] Bugfix in expandRxSBG() Csmith discovered a program that caused wrong code generation with -O0: When handling a SIGN_EXTEND in expandRxSBG(), RxSBG.BitSize may be less than the Input width (if a truncate was previously traversed), so maskMatters() should be called with a masked based on the width of the sign extend result instead. Review: Ulrich Weigand llvm-svn: 319892	2017-12-06 13:53:24 +00:00
Ulrich Weigand	5bfed6cb7c	[SystemZ] Validate shifted compare value in adjustForTestUnderMask When folding a shift into a test-under-mask comparison, make sure that there is no loss of precision when creating the shifted comparison value. This usually never happens, except for certain always-true comparisons in unoptimized code. Fixes PR35529. llvm-svn: 319818	2017-12-05 19:42:07 +00:00
Jonas Paulsson	b5b91cd402	[SystemZ] set 'guessInstructionProperties = 0' and set flags as needed. This has proven a healthy exercise, as many cases of incorrect instruction flags were corrected in the process. As part of this, IntrWriteMem was added to several SystemZ instrinsics. Furthermore, a bug was exposed in TwoAddress with this change (as incorrect hasSideEffects flags were removed and instructions could now be sunk), and the test case for that bugfix (r319646) is included here as test/CodeGen/SystemZ/twoaddr-sink.ll. One temporary test regression (one extra copy) which will hopefully go away in upcoming patches for similar cases: test/CodeGen/SystemZ/vec-trunc-to-i1.ll Review: Ulrich Weigand. https://reviews.llvm.org/D40437 llvm-svn: 319756	2017-12-05 11:24:39 +00:00
Jonas Paulsson	86c40db49d	[Regalloc] Generate and store multiple regalloc hints. MachineRegisterInfo used to allow just one regalloc hint per virtual register. This patch extends this to a vector of regalloc hints, which is filled in by common code with sorted copy hints. Such hints will make for more ID copies that can be removed. NB! This improvement is currently (and hopefully temporarily) disabled by default, except for SystemZ. The only reason for this is the big impact this has on tests, which has unfortunately proven unmanageable. It was a long while since all the tests were updated and just waiting for review (which didn't happen), but now targets have to enable this themselves instead. Several targets could get a head-start by downloading the tests updates from the Phabricator review. Thanks to those who helped, and sorry you now have to do this step yourselves. This should be an improvement generally for any target! The target may still create its own hint, in which case this has highest priority and is stored first in the vector. If it has target-type, it will not be recomputed, as per the previous behaviour. The temporary hook enableMultipleCopyHints() will be removed as soon as all targets return true. Review: Quentin Colombet, Ulrich Weigand. https://reviews.llvm.org/D38128 llvm-svn: 319754	2017-12-05 10:52:24 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	c71cced0aa	[CodeGen] Always use `printReg` to print registers in both MIR and debug output As part of the unification of the debug format and the MIR format, always use `printReg` to print all kinds of registers. Updated the tests using '_' instead of '%noreg' until we decide which one we want to be the default one. Differential Revision: https://reviews.llvm.org/D40421 llvm-svn: 319445	2017-11-30 16:12:24 +00:00
Jonas Paulsson	b9a2467501	[SystemZ] Bugfix in adjustSubwordCmp. Csmith generated a program where a store after load to the same address did not get chained after the new load created during DAG legalizing, and so performed an illegal overwrite of the expected value. When the new zero-extending load is created, the chain users of the original load must be updated, which was not done previously. A similar case was also found and handled in lowerBITCAST. Review: Ulrich Weigand https://reviews.llvm.org/D40542 llvm-svn: 319409	2017-11-30 08:18:50 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Nirav Dave	db77e57ea8	[DAG] Do MergeConsecutiveStores again before Instruction Selection Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036	2017-11-27 15:28:15 +00:00
Jonas Paulsson	181e260e32	[DAGCombiner] Bugfix in isAlias(). Since i1 is a legal type, this: NumBytes = Op1->getMemoryVT().getSizeInBits() >> 3; is wrong and should be instead NumBytes = Op0->getMemoryVT().getStoreSize(); There seems to be more places where this should be fixed outside DAGCombiner. Review: Hal Finkel https://bugs.llvm.org/show_bug.cgi?id=35366 llvm-svn: 318824	2017-11-22 08:58:30 +00:00
Jonas Paulsson	12e3a58842	[SystemZ] Bugfix for handling of subregisters in getRegAllocationHints(). The 32 bit subreg indices of GR128 registers must also be checked for in getRC32(). Review: Ulrich Weigand. llvm-svn: 318652	2017-11-20 14:54:03 +00:00
Rong Xu	3573d8da36	[CodeGen] Peel off the dominant case in switch statement in lowering This patch peels off the top case in switch statement into a branch if the probability exceeds a threshold. This will help the branch prediction and avoids the extra compares when lowering into chain of branches. Differential Revision: http://reviews.llvm.org/D39262 llvm-svn: 318202	2017-11-14 21:44:09 +00:00
Ulrich Weigand	5f4373a2fc	[SystemZ] Do not crash when selecting an OR of two constants In rare cases, common code will attempt to select an OR of two constants. This confuses the logic in splitLargeImmediate, causing an internal error during isel. Fixed by simply leaving this case to common code to handle. This fixes PR34859. llvm-svn: 318187	2017-11-14 20:00:34 +00:00
Ulrich Weigand	55b8590e03	[SystemZ] Fix invalid codegen using RISBMux on out-of-range bits Before using the 32-bit RISBMux set of instructions we need to verify that the input bits are actually within range of the 32-bit instruction. This fixer PR35289. llvm-svn: 318177	2017-11-14 19:20:46 +00:00
Jonas Paulsson	4b017e682d	[RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints. * The method getRegAllocationHints() is now of bool type instead of void. If true is returned, regalloc (AllocationOrder) will only try to allocate the hints, as opposed to merely trying them before non-hinted registers. * TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with an increase in number of LOCRs. In this case, it is desired to force the hints even though there is a slight increase in spilling, because if a non-hinted register would be allocated, the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR (Load On Condition) SystemZ instruction must have both operands in either the low or high part of the 64 bit register. Reviewers: Quentin Colombet and Ulrich Weigand https://reviews.llvm.org/D36795 llvm-svn: 317879	2017-11-10 08:46:26 +00:00
Ulrich Weigand	d39e9dca1b	[SystemZ] Add support for the "o" inline asm constraint We don't really need any special handling of "offsettable" memory addresses, but since some existing code uses inline asm statements with the "o" constraint, add support for this constraint for compatibility purposes. llvm-svn: 317807	2017-11-09 16:31:57 +00:00
Jonas Paulsson	c63ed222b8	[SystemZ] Enable machine scheduler. The machine scheduler (before register allocation) is enabled by default for SystemZ. The SelectionDAG scheduling preference now becomes source order scheduling (was regpressure). Review: Ulrich Weigand https://reviews.llvm.org/D37977 llvm-svn: 315063	2017-10-06 13:59:28 +00:00
Jonas Paulsson	c9e363ac69	[SystemZ] implement shouldCoalesce() Implement shouldCoalesce() to help regalloc avoid running out of GR128 registers. If a COPY involving a subreg of a GR128 is coalesced, the live range of the GR128 virtual register will be extended. If this happens where there are enough phys-reg clobbers present, regalloc will run out of registers (if there is not a single GR128 allocatable register available). This patch tries to allow coalescing only when it can prove that this will be safe by checking the (local) interval in question. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D37899 https://bugs.llvm.org/show_bug.cgi?id=34610 llvm-svn: 314516	2017-09-29 14:31:39 +00:00
Ulrich Weigand	df86855f61	[SystemZ] Fix fall-out from r314428 The expensive-checks build bot found a problem with the r314428 commit: if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be marked as live-in to the block after the loop the pseudo gets expanded to. This actually fixes a code-gen bug as well, since if the CC isn't live, the CR and JLH are merged to a CRJLH which doesn't actually set the condition code any more. llvm-svn: 314465	2017-09-28 22:08:25 +00:00
Ulrich Weigand	0f1de04979	[SystemZ] Custom-expand ATOMIC_CMP_AND_SWAP_WITH_SUCCESS The SystemZ compare-and-swap instructions already provide the "success" indication via a condition-code value, so the default expansion of those operations generates an unnecessary extra comparsion. llvm-svn: 314428	2017-09-28 16:22:54 +00:00
Jonas Paulsson	b0e8a2e623	[SystemZ] Improve optimizeCompareZero() More conversions to load-and-test can be made with this patch by adding a forward search in optimizeCompareZero(). Review: Ulrich Weigand https://reviews.llvm.org/D38076 llvm-svn: 313877	2017-09-21 13:52:24 +00:00
Ulrich Weigand	59a01a958a	[SystemZ] Fix truncstore + bswap codegen bug SystemZTargetLowering::combineSTORE contains code to transform a combination of STORE + BSWAP into a STRV type instruction. This transformation is correct for regular stores, but not for truncating stores. The routine neglected to check for that case. Fixes a miscompilation of llvm-objcopy with clang, which caused test suite failures in the SystemZ multistage build bot. llvm-svn: 313669	2017-09-19 20:50:05 +00:00
NAKAMURA Takumi	38fac5905e	Move llvm/test/CodeGen/X86/clear-liverange-spillreg.mir to SystemZ. It was in wrong place. llvm-svn: 313218	2017-09-14 00:03:23 +00:00
Jonas Paulsson	fc4f323ac1	[SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers. This bit is needed in order for the CalleeSavedRegs list to automatically include the super registers if all of their subregs are present. Thanks to Wei Mi for initially indicating this deficiency in the SystemZ backend. Review: Ulrich Weigand. https://bugs.llvm.org/show_bug.cgi?id=34550 llvm-svn: 313023	2017-09-12 12:11:29 +00:00
Jonas Paulsson	57a705d9d0	[SystemZ, MachineScheduler] Improve post-RA scheduling. The idea of this patch is to continue the scheduler state over an MBB boundary in the case where the successor block has only one predecessor. This means that the scheduler will continue in the successor block (after emitting any branch instructions) with e.g. maintained processor resource counters. Benchmarks have been confirmed to benefit from this. The algorithm in MachineScheduler.cpp that extracts scheduling regions of an MBB has been extended so that the strategy may optionally reverse the order of processing the regions themselves. This is controlled by a new method doMBBSchedRegionsTopDown(), which defaults to false. Handling the top-most region of an MBB first also means that a top-down scheduler can continue the scheduler state across any scheduling boundary between to regions inside MBB. Review: Ulrich Weigand, Matthias Braun, Andy Trick. https://reviews.llvm.org/D35053 llvm-svn: 311072	2017-08-17 08:33:44 +00:00
Mikael Holmen	8b10680922	[IfConversion] Maintain the CFG when predicating/merging blocks in IfConvert* Summary: This fixes PR32721 in IfConvertTriangle and possible similar problems in IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond. In PR32721 we had a triangle EBB \| \ \| \| \| TBB \| / FBB where FBB didn't have any successors at all since it ended with an unconditional return. Then TBB and FBB were be merged into EBB, but EBB would still keep its successors, and the use of analyzeBranch and CorrectExtraCFGEdges wouldn't help to remove them since the return instruction is not analyzable (at least not on ARM). The edge updating code and branch probability updating code is now pushed into MergeBlocks() which allows us to share the same update logic between more callsites. This lets us remove several dependencies on analyzeBranch and completely eliminate RemoveExtraEdges. One thing that showed up with this patch was that IfConversion sometimes left a successor with 0% probability even if there was no branch or fallthrough to the successor. One such example from the test case ifcvt_bad_zero_prob_succ.mir. The indirect branch tBRIND can only jump to bb.1, but without the patch we got: bb.0: successors: %bb.1(0x80000000) bb.1: successors: %bb.1(0x80000000), %bb.2(0x00000000) tBRIND %r1, 1, %cpsr B %bb.1 bb.2: There is no way to jump from bb.1 to bb2, but still there is a 0% edge from bb.1 to bb.2. With the patch applied we instead get the expected: bb.0: successors: %bb.1(0x80000000) bb.1: successors: %bb.1(0x80000000) tBRIND %r1, 1, %cpsr B %bb.1 Since bb.2 had no predecessor at all, it was removed. Several testcases had to be updated due to this since the removed successor made the "Branch Probability Basic Block Placement" pass sometimes place blocks in a different order. Finally added a couple of new test cases: * PR32721_ifcvt_triangle_unanalyzable.mir: Regression test for the original problem dexcribed in PR 32721. * ifcvt_triangleWoCvtToNextEdge.mir: Regression test for problem that caused a revert of my first attempt to solve PR 32721. * ifcvt_simple_bad_zero_prob_succ.mir: Test case showing the problem where a wrong successor with 0% probability was previously left. * ifcvt_[diamond\|forked_diamond\|simple]_unanalyzable.mir Very simple test cases for the simple and (forked) diamond cases involving unanalyzable branches that can be nice to have as a base if wanting to write more complicated tests. Reviewers: iteratee, MatzeB, grosser, kparzysz Reviewed By: kparzysz Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34099 llvm-svn: 310697	2017-08-11 06:57:08 +00:00
Evgeny Stupachenko	c675290680	Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720). The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289	2017-08-07 19:56:34 +00:00
Ulrich Weigand	a11f63a952	[SystemZ] Add support for 128-bit atomic load/store/cmpxchg This adds support for the main 128-bit atomic operations, using the SystemZ instructions LPQ, STPQ, and CDSG. Generating these instructions is a bit more complex than usual since the i128 type is not legal for the back-end. Therefore, we have to hook the LowerOperationWrapper and ReplaceNodeResults TargetLowering callbacks. llvm-svn: 310094	2017-08-04 18:57:58 +00:00
Ulrich Weigand	02f1c02c27	[SystemZ] Eliminate unnecessary serialization operations We currently emit a serialization operation (bcr 14, 0) before every atomic load and after every atomic store. This is overly conservative. The SystemZ architecture actually does not require any serialization for atomic loads, and a serialization after an atomic store only if we need to enforce sequential consistency. This is what other compilers for the platform implement as well. llvm-svn: 310093	2017-08-04 18:53:35 +00:00
Jonas Paulsson	be7a7e4979	[SystemZ] test update test/CodeGen/SystemZ/loop-01.ll was incorrectly updated by r308729. llvm-svn: 308736	2017-07-21 13:14:17 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Ulrich Weigand	f2968d58cb	[SystemZ] Add support for IBM z14 processor (3/3) This adds support for the new 128-bit vector float instructions of z14. Note that these instructions actually only operate on the f128 type, since only each 128-bit vector register can hold only one 128-bit float value. However, this is still preferable to the legacy 128-bit float instructions, since those operate on pairs of floating-point registers (so we can hold at most 8 values in registers), while the new instructions use single vector registers (so we hold up to 32 value in registers). Adding support includes: - Enabling the instructions for the assembler/disassembler. - CodeGen for the instructions. This includes allocating the f128 type now to the VR128BitRegClass instead of FP128BitRegClass. - Scheduler description support for the instructions. Note that for a small number of operations, we have no new vector instructions (like integer <-> 128-bit float conversions), and so we use the legacy instruction and then reformat the operand (i.e. copy between a pair of floating-point registers and a vector register). llvm-svn: 308196	2017-07-17 17:44:20 +00:00
Ulrich Weigand	33435c4c9c	[SystemZ] Add support for IBM z14 processor (2/3) This adds support for the new 32-bit vector float instructions of z14. This includes: - Enabling the instructions for the assembler/disassembler. - CodeGen for the instructions, including new LLVM intrinsics. - Scheduler description support for the instructions. - Update to the vector cost function calculations. In general, CodeGen support for the new v4f32 instructions closely matches support for the existing v2f64 instructions. llvm-svn: 308195	2017-07-17 17:42:48 +00:00
Ulrich Weigand	2b3482fe85	[SystemZ] Add support for IBM z14 processor (1/3) This patch series adds support for the IBM z14 processor. This part includes: - Basic support for the new processor and its features. - Support for new instructions (except vector 32-bit float and 128-bit float). - CodeGen for new instructions, including new LLVM intrinsics. - Scheduler description for the new processor. - Detection of z14 as host processor. Support for the new 32-bit vector float and 128-bit vector float instructions is provided by separate patches. llvm-svn: 308194	2017-07-17 17:41:11 +00:00
Quentin Colombet	868ef847a6	[RegAllocFast] Don't insert kill flags of super-register for partial kill When reusing a register for a new definition, the fast register allocator used to insert a kill flag at the previous last use of that register to inform later passes that this register is free between the redef and the last use. However, this may be wrong when subregisters are involved. Indeed, a partially redef would have trigger a kill of the full super register, potentially wrongly marking all the other subregisters as free. Given we don't track which lanes are still live, we cannot set the kill flag in such case. Note: This bug has been latent for about 7 years (r104056). llvmg.org/PR33677 llvm-svn: 307428	2017-07-07 19:25:45 +00:00
Ulrich Weigand	af98b748f6	[SystemZ] Fix missing emergency spill slot corner case We sometimes need emergency spill slots for the register scavenger. This may be the case when code needs to access a stack slot that has an offset of 4096 or more relative to the stack pointer. To make that determination, processFunctionBeforeFrameFinalized currently simply checks the total stack frame size of the current function. But this is not enough, since code may need to access stack slots in the caller's stack frame as well, in particular incoming arguments stored on the stack. This commit fixes the problem by taking argument slots into account. llvm-svn: 306305	2017-06-26 16:50:32 +00:00
Jonas Paulsson	8c33647ba1	[SystemZ] Add a check against zero before calling getTestUnderMaskCond() Csmith discovered that this function can be called with a zero argument, in which case an assert for this triggered. This patch also adds a guard before the other call to this function since it was missing, although the test only covers the case where it was discovered. Reduced test case attached as CodeGen/SystemZ/int-cmp-54.ll. Review: Ulrich Weigand llvm-svn: 306287	2017-06-26 13:38:27 +00:00
Ulrich Weigand	eaf0051ba3	[SystemZ] Remove unnecessary serialization before volatile loads This reverts the use of TargetLowering::prepareVolatileOrAtomicLoad introduced by r196905. Nothing in the semantics of the "volatile" keyword or the definition of the z/Architecture actually requires that volatile loads are preceded by a serialization operation, and no other compiler on the platform actually implements this. Since we've now seen a use case where this additional serialization causes noticable performance degradation, this patch removes it. The patch still leaves in the serialization before atomic loads, which is now implemented directly in lowerATOMIC_LOAD. (This also seems overkill, but that can be addressed separately.) llvm-svn: 306117	2017-06-23 15:56:14 +00:00
Jonas Paulsson	82f15a7168	[SystemZ] Fix trap issue and enable expensive checks. The isBarrier/isTerminator flags have been removed from the SystemZ trap instructions, so that tests do not fail with EXPENSIVE_CHECKS. This was just an issue at -O0 and did not affect code output on benchmarks. (Like Eli pointed out: "targets are split over whether they consider their "trap" a terminator; x86, AArch64, and NVPTX don't, but ARM, MIPS, PPC, and SystemZ do. We should probably try to be consistent here.". This is still the case, although SystemZ has switched sides). SystemZ now returns true in isMachineVerifierClean() :-) These Generic tests have been modified so that they can be run with or without EXPENSIVE_CHECKS: CodeGen/Generic/llc-start-stop.ll and CodeGen/Generic/print-machineinstrs.ll Review: Ulrich Weigand, Simon Pilgrim, Eli Friedman https://bugs.llvm.org/show_bug.cgi?id=33047 https://reviews.llvm.org/D34143 llvm-svn: 306106	2017-06-23 14:30:46 +00:00
Geoff Berry	06c9dc3d9c	[SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno Summary: This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU target triples. This optimization was being inhibited when -ffast-math wasn't set because sincos in GLibC does not set errno, while sin and cos do. However, this optimization will only run if the attributes on the sin/cos calls include readnone, which is how clang represents the fact that it doesn't care about the errno values set by these functions (via the -fno-math-errno flag). Reviewers: hfinkel, bogner Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond Differential Revision: https://reviews.llvm.org/D32921 llvm-svn: 305204	2017-06-12 17:15:41 +00:00
Quentin Colombet	1ee8616ca0	[SystemZ] Simplify test case. NFC Remove useless successors information. llvm-svn: 304615	2017-06-02 23:40:58 +00:00
Quentin Colombet	2145cf3f07	[RABasic] Properly update the LiveRegMatrix when LR splitting occur Prior to this patch we used to not touch the LiveRegMatrix while doing live-range splitting. In other words, when live-range splitting was occurring, the LiveRegMatrix was not reflecting the changes. This is generally fine because it means the query to the LiveRegMatrix will be conservately correct. However, when decisions are taken based on what is going to happen on the interferences (e.g., when we spill a register and know that it is going to be available for another one), we might hit an assertion that the color used for the assignment is still in use. This patch makes sure the changes on the live-ranges are properly reflected in the LiveRegMatrix, so the assertions don't break. An alternative could have been to remove the assertion, but it would make the invariants of the code and the general reasoning more complicated in my opnion. http://llvm.org/PR33057 llvm-svn: 304603	2017-06-02 22:46:31 +00:00
Nirav Dave	da8f221273	Elide stores which are overwritten without being observed. Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 llvm-svn: 303198	2017-05-16 19:43:56 +00:00
Jonas Paulsson	d1ec738502	Handle a COPY with undef source operand in LowerCopy() Llvm-stress discovered that a COPY may end up in ExpandPostRA::LowerCopy() with an undef source operand. It is not possible for the target to handle this, as this flag is not passed to TII->copyPhysReg(). This patch solves this by treating such a COPY as an identity COPY. Review: Matthias Braun https://reviews.llvm.org/D32892 llvm-svn: 302877	2017-05-12 06:32:03 +00:00
Jonas Paulsson	11d251c05c	[SystemZ] Implement getRepRegClassFor() This method must return a valid register class, or the list-ilp isel scheduler will crash. For MVT::Untyped nullptr was previously returned, but now ADDR128BitRegClass is returned instead. This is needed just as long as list-ilp (and probably also list-hybrid) is still there. Review: Ulrich Weigand, A Trick https://reviews.llvm.org/D32802 llvm-svn: 302649	2017-05-10 13:03:25 +00:00
Jonas Paulsson	4fd156261e	[SystemZ] Make copyPhysReg() add impl-use operands of super reg. When a 128 bit COPY is lowered into two instructions, an impl-use operand of the super-reg should be added to each new instruction in case one of the sub-regs is undefined. Review: Ulrich Weigand llvm-svn: 302146	2017-05-04 13:33:30 +00:00
Jonas Paulsson	1e8648577c	[SystemZ] Update kill-flag in splitMove(). EarlierMI needs to clear the kill flag on the first operand in case of a store. Review: Ulrich Weigand llvm-svn: 301177	2017-04-24 12:40:28 +00:00
Matt Arsenault	f10061ec70	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876	2017-04-10 20:18:21 +00:00
Jonas Paulsson	cad72efee6	[SystemZ] Check for presence of vector support in SystemZISelLowering A test case was found with llvm-stress that caused DAGCombiner to crash when compiling for an older subtarget without vector support. SystemZTargetLowering::combineTruncateExtract() should do nothing for older subtargets. This check was placed in canTreatAsByteVector(), which also helps in a few other places. Review: Ulrich Weigand llvm-svn: 299763	2017-04-07 12:35:11 +00:00
Nirav Dave	aa65a2beb8	[SystemZ] Prevent Merging Bitcast with non-normal loads Fixes PR32505. Reviewers: uweigand, jonpa Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31609 llvm-svn: 299552	2017-04-05 15:42:48 +00:00
Jonas Paulsson	38a2da92bc	[DAGCombiner] Don't make a BUILD_VECTOR with operands of illegal type. When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with constant operands, a new BUILD_VECTOR node will be created transformed constants. Llvm-stress found a case where the new BUILD_VECTOR had constant operands of an illegal type, because the (legal) element type is in fact not a legal scalar type. This patch changes this so that the new BUILD_VECTOR has the same operand type as the old one. Review: Eli Friedman, Nirav Dave https://bugs.llvm.org//show_bug.cgi?id=32422 llvm-svn: 299540	2017-04-05 13:45:37 +00:00
Jonas Paulsson	c7bb22e75f	[SystemZ] Make sure of correct regclasses in insertSelect() Since LOCR only accepts GR32 virtual registers, its operands must be copied into this regclass in insertSelect(), when an LOCR is built. Otherwise, the case where the source operand was GRX32 will produce invalid IR. Review: Ulrich Weigand llvm-svn: 299220	2017-03-31 14:06:59 +00:00
Jonas Paulsson	56bb0857e9	[SystemZ] Skip DAGCombining of vector node for older subtargets. Even on older subtargets that lack vector support, there may be vector values with just one element in the input program. These are converted during DAG legalization to scalar values. The pre-legalize SystemZ DAGCombiner methods should in this circumstance not touch these nodes. This patch adds a check for this in SystemZTargetLowering::combineEXTRACT_VECTOR_ELT(). Review: Ulrich Weigand llvm-svn: 299213	2017-03-31 13:22:59 +00:00
Nirav Dave	9b5563c52c	[SDAG] Fix Stale SDNode usage in visitAND Reorder CombineTo Calls to prevent potential use of deleted node. Fixes PR32372. Reviewers: jnspaulsson, RKSimon, uweigand, jonpa Reviewed By: jonpa Subscribers: jonpa, llvm-commits Differential Revision: https://reviews.llvm.org/D31346 llvm-svn: 298920	2017-03-28 14:11:20 +00:00
Jonas Paulsson	808c89f467	[SystemZ] Don't drop any operands in expandZExtPseudo() Make sure that any operands, e.g. of an implicit def of a super reg is transferred to the new instruction. Review: Ulrich Weigand llvm-svn: 298484	2017-03-22 06:03:32 +00:00
Jonas Paulsson	54c7680e1f	[DAGTypeLegalizer] Handle widening truncate to vector of i1. Previously, PromoteIntRes_TRUNCATE() did not handle the case where the operand needs widening, which resulted in llvm_unreachable(). This patch adds the needed handling, along with a test case. Review: Eli Friedman, Simon Pilgrim. https://reviews.llvm.org/D31077 llvm-svn: 298357	2017-03-21 10:24:14 +00:00
Jonas Paulsson	bd65421f08	[SystemZ] Don't drop MO flags in foldMemoryOperandImpl() The def operand of the new LG/LD should have the old def operands flags and subreg index. New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll Review: Ulrich Weigand llvm-svn: 298341	2017-03-21 05:49:40 +00:00
Jonas Paulsson	f496bd9a59	[SystemZ] New CodeGen tests for vector compare / select. New SystemZ tests for the improved codegen of vector compare and select, including cases with a logical combination of two compares. Review: Ulrich Weigand. https://reviews.llvm.org/D29489 llvm-svn: 298049	2017-03-17 07:11:46 +00:00
Jonas Paulsson	8a7bd24c82	[SystemZ] Add use of super-reg in splitMove() If one of the subregs of the 128 bit reg is undefined when splitMove() splits a store into two instructions, a use of an undefined physical register results. To remedy this, an implicit use of the super register is added onto both new instructions, along with propagated kill and undef flags. This was discovered with llvm-stress, and that test case is attached as test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll Thanks to Matthias Braun for helping with a nice explanation. Review: Ulrich Weigand llvm-svn: 298047	2017-03-17 06:47:08 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Jonas Paulsson	1d33cd3988	[SystemZ] Add check VT.isSimple() in canTreateAsByteVector() Since BB-vectorizer can produce vectors of for example 3 elements, this check is needed. Review: Ulrich Weigand llvm-svn: 297136	2017-03-07 09:49:31 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Nirav Dave	f830dec3f2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476	2017-02-28 14:24:15 +00:00
Nirav Dave	73cd0194cf	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279	2017-02-26 01:27:32 +00:00
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
Ahmed Bougacha	9677cc6fb7	[TLI] Robustize SDAG LibFunc proto checking by merging it into TLI. This re-applies commit r292189, reverted in r292191. SelectionDAGBuilder recognizes libfuncs using some homegrown parameter type-checking. Use TLI instead, removing another heap of redundant code. This isn't strictly NFC, as the SDAG code was too lax. Concretely, this means changes are required to a few tests: - calling a non-variadic function via a variadic prototype isn't OK; it just happens to work on x86_64 (but not on, e.g., aarch64). - mempcpy has a size_t parameter; the SDAG code accepts any integer type, which meant using i32 on x86_64 worked. - a handful of SystemZ tests check the SDAG support for lax prototype checking: Ulrich agrees on removing them. I don't think it's worth supporting any of these (IMO) invalid testcases. Instead, fix them to be more meaningful. llvm-svn: 294028	2017-02-03 19:11:19 +00:00
Sanne Wouda	57b63d6ade	[LLC] Add an inline assembly diagnostics handler. Summary: llc would hit a fatal error for errors in inline assembly. The diagnostics message is now printed. Reviewers: rengolin, MatzeB, javed.absar, anemet Reviewed By: anemet Subscribers: jyknight, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D29408 llvm-svn: 293999	2017-02-03 11:14:39 +00:00
Nirav Dave	93f9d5ce04	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915	2017-02-02 18:24:55 +00:00
Nirav Dave	4442667fc5	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893	2017-02-02 14:39:42 +00:00
Kyle Butt	b15c06677c	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 llvm-svn: 293716	2017-01-31 23:48:32 +00:00
Justin Bogner	8f520a73b2	SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we happened to replace a node during UpdateChains, because it would be left in the list we were iterating over. This nulls out the pointer when that happens so that we can avoid the issue. Fixes llvm.org/PR31710 llvm-svn: 293522	2017-01-30 18:29:46 +00:00
Matt Arsenault	32e6bfa20f	DAG: Fold fneg into compare with constant into the constant fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512	2017-01-30 17:57:28 +00:00
Jonas Paulsson	bb0ed3e732	[DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert(). In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a partial number of available vector elements), WidenVecRes_Convert() used to resort to scalarization. This patch adds a handling of the (common) case where an input vector can be found of same width as the widened result vector, by converting the node to SIGN/ZERO_EXTEND_VECTOR_INREG. Review: Eli Friedman llvm-svn: 293268	2017-01-27 07:46:26 +00:00
Nirav Dave	d32a421f75	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188	2017-01-26 16:46:13 +00:00
Nirav Dave	de6516c466	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184	2017-01-26 16:02:24 +00:00
Simon Pilgrim	4b51989635	Fixed parser error on windows shell evaluation of RUN script line llvm-svn: 292363	2017-01-18 11:40:28 +00:00
Jonas Paulsson	a9bb00d82b	[SystemZ] Proper handling of undef flag while expanding pseudo. During post-RA pseudo expansion, an 'undef' flag of the source operand should be propagated by emitGRX32Move(). Review: Ulrich Weigand llvm-svn: 292353	2017-01-18 08:32:54 +00:00
Kyle Butt	efe56fed12	Revert "CodeGen: Allow small copyable blocks to "break" the CFG." This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695	2017-01-11 19:55:19 +00:00
Kyle Butt	df27aa8c89	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609	2017-01-10 23:04:30 +00:00
Nirav Dave	f5bf03c7ef	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667	2016-12-14 16:43:44 +00:00
Nirav Dave	8527ab0ad2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659	2016-12-14 15:44:26 +00:00
Nirav Dave	bedb5d906c	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226	2016-12-09 17:18:24 +00:00
Nirav Dave	fd51ff4fd8	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221	2016-12-09 16:15:12 +00:00
Ulrich Weigand	1c5a5c42de	[SystemZ] Support floating-point control register instructions Add assembler support for instructions manipulating the FPC. Also add codegen support via the GCC compatibility builtins: __builtin_s390_sfpc __builtin_s390_efpc llvm-svn: 288525	2016-12-02 18:21:53 +00:00
Ulrich Weigand	2d9e3d9d3b	[SystemZ] Support load-and-trap instructions This adds support for the instructions provided with the load-and-trap facility. llvm-svn: 288030	2016-11-28 13:59:22 +00:00
Ulrich Weigand	758399131a	[SystemZ] Add remaining branch instructions This patch adds assembler support for the remaining branch instructions: the non-relative branch on count variants, and all variants of branch on index. The only one of those that can be readily exploited for code generation is BRCTH (branch on count using a high 32-bit register as count). Do use it, however, it is necessary to also introduce a hew CHIMux pseudo to allow comparisons of a 32-bit value agains a short immediate to go into a high register as well (implemented via CHI/CIH). This causes a bit of codegen changes overall, but those have proven to be neutral (or even beneficial) in performance measurements. llvm-svn: 288029	2016-11-28 13:40:08 +00:00
Ulrich Weigand	524f276c74	[SystemZ] Improve use of conditional instructions This patch moves formation of LOC-type instructions from (late) IfConversion to the early if-conversion pass, and in some cases additionally creates them directly from select instructions during DAG instruction selection. To make early if-conversion work, the patch implements the canInsertSelect / insertSelect callbacks. It also implements the commuteInstructionImpl and FoldImmediate callbacks to enable generation of the full range of LOC instructions. Finally, the patch adds support for all instructions of the load-store-on-condition-2 facility, which allows using LOC instructions also for high registers. Due to the use of the GRX32 register class to enable high registers, we now also have to handle the cases where there are still no single hardware instructions (conditional move from a low register to a high register or vice versa). These are converted back to a branch sequence after register allocation. Since the expandRAPseudos callback is not allowed to create new basic blocks, this requires a simple new pass, modelled after the ARM/AArch64 ExpandPseudos pass. Overall, this patch causes significantly more LOC-type instructions to be used, and results in a measurable performance improvement. llvm-svn: 288028	2016-11-28 13:34:08 +00:00
Ulrich Weigand	a0e7325023	[SystemZ] Support CL(G)T instructions This adds support for the compare logical and trap (memory) instructions that were added as part of the miscellaneous instruction extensions feature with zEC12. llvm-svn: 286587	2016-11-11 12:48:26 +00:00
Ulrich Weigand	92c2c672e5	[SystemZ] Support load-and-zero-rightmost-byte facility This adds support for the LZRF/LZRG/LLZRGF instructions that were added on z13, and uses them for code generation were appropriate. SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF over RISBG where both would be possible. llvm-svn: 286586	2016-11-11 12:46:28 +00:00
Ulrich Weigand	5dc7b67c62	[SystemZ] Use LLGT(R) instructions This adds support for the 31-to-64-bit zero extension instructions LLGT and LLGTR and uses them for code generation where appropriate. Since this operation can also be performed via RISBG, we have to update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT over RISBG in case both are possible. The patch includes some simplification to the tryRISBGZero code; this is not intended to cause any (further) functional change in codegen. llvm-svn: 286585	2016-11-11 12:43:51 +00:00
Matthias Braun	325cd2c98a	ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps() addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544	2016-11-11 01:34:21 +00:00
Ulrich Weigand	7bdb485e18	[SystemZ] Do not use LOC(G) for volatile loads It is not safe to use LOAD ON CONDITION to implement access to a memory location marked "volatile", since the architecture leaves it unspecified whether or not an access happens if the condition is false. The current code already appears to care about that: def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>; Unfortunately, that "nonvolatile_load" operator is simply ignored by the CondUnaryRSY class, and there was no test to catch it. llvm-svn: 285077	2016-10-25 15:39:15 +00:00
Jonas Paulsson	8010b631d5	[SystemZ] Post-RA scheduler implementation Post-RA sched strategy and scheduling instruction annotations for z196, zEC12 and z13. This scheduler optimizes decoder grouping and balances processor resources (including side steering the FPd unit instructions). The SystemZHazardRecognizer keeps track of the scheduling state, which can be dumped with -debug-only=misched. Reviers: Ulrich Weigand, Andrew Trick. https://reviews.llvm.org/D17260 llvm-svn: 284704	2016-10-20 08:27:16 +00:00
Sanjay Patel	3a3aaf67e0	[DAG] optimize negation of bool Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG. With the mask, this should be no worse than 2 shifts. The mask can be eliminated in some cases, so that should be better than 2 shifts. This change exposed some missing folds related to negation: https://reviews.llvm.org/rL284239 https://reviews.llvm.org/rL284395 There may be others, so please let me know if you see any regressions. Differential Revision: https://reviews.llvm.org/D25485 llvm-svn: 284611	2016-10-19 16:58:59 +00:00
Nirav Dave	a81682aad4	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157	2016-10-13 20:23:25 +00:00
Nirav Dave	4b36957243	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151	2016-10-13 19:20:16 +00:00
Konstantin Zhuravlyov	081385a74e	[DAGCombiner] Do not remove the load of stored values when optimizations are disabled This combiner breaks debug experience and should not be run when optimizations are disabled. For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function. Differential Revision: https://reviews.llvm.org/D19268 llvm-svn: 284014	2016-10-12 13:44:24 +00:00
Arnold Schwaighofer	3f25658143	swifterror: Don't compute swifterror vregs during instruction selection The code used llvm basic block predecessors to decided where to insert phi nodes. Instruction selection can and will liberally insert new machine basic block predecessors. There is not a guaranteed one-to-one mapping from pred. llvm basic blocks and machine basic blocks. Therefore the current approach does not work as it assumes we can mark predecessor machine basic block as needing a copy, and needs to know the set of all predecessor machine basic blocks to decide when to insert phis. Instead of computing the swifterror vregs as we select instructions, propagate them at the end of instruction selection when the MBB CFG is complete. When an instruction needs a swifterror vreg and we don't know the value yet, generate a new vreg and remember this "upward exposed" use, and reconcile this at the end of instruction selection. This will only happen if the target supports promoting swifterror parameters to registers and the swifterror attribute is used. rdar://28300923 llvm-svn: 283617	2016-10-07 22:06:55 +00:00

1 2 3 4 5 ...

571 Commits