llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	0a59647ee4	[SystemZ] misched-cutoff tests can only be tested on non-NDEBUG (assertion) builds Fixes clang-with-thin-lto-ubuntu buildbot after D94383/rGddd03842c347	2021-01-14 15:46:27 +00:00
Jonas Paulsson	ddd03842c3	[SystemZ] Clear Available set in SystemZPostRASchedStrategy::initialize(). This needs to be done in order to not crash with -misched-cutoff. Fixes https://bugs.llvm.org/show_bug.cgi?id=45928 Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D94383	2021-01-13 18:18:27 -06:00
Fangrui Song	a90e5a8f0d	[SystemZ][test] Add explicit dso_local to definitions in ELF static relocation model tests	2020-12-30 15:26:09 -08:00
Layton Kifer	d29f93bda5	[DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced Fixes a bug introduced by D91589. When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced. Thanks @jgorbe, for finding the below reproducer. The following reduced test case crashes clang when built with `clang -O1 -frounding-math`: ``` template <class> class a { int b() { return c == 0.0 ? 0 : -1; } int c; }; template class a<long>; ``` A debug build of clang produces this "assertion failed" error: ``` clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm:: SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93274	2020-12-23 16:16:26 -08:00
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Jonas Paulsson	653b97690f	[SystemZ] Improve handling of backchain offset. - New function SDValue getBackchainAddress() used by lowerDYNAMIC_STACKALLOC() and lowerSTACKRESTORE() to properly handle the backchain offset also with packed-stack. - Make a common function getBackchainOffset() for the computation of the backchain offset and use in some places (NFC). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D93171	2020-12-14 12:39:38 -06:00
Jonas Paulsson	42f628c842	Reapply "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing." Fixed to properly compute the live-in lists of new blocks. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D92803	2020-12-11 18:25:47 -06:00
Jonas Paulsson	bc7a61b703	Revert "[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing." Temporarily reverted. This reverts commit `ea475c77ff`.	2020-12-10 18:05:51 -06:00
Jonas Paulsson	ea475c77ff	[SystemZFrameLowering] Don't overrwrite R1D (backchain) when probing. The loop-based probing done for stack clash protection altered R1D which corrupted the backchain value to be stored after the probing was done. By using R0D instead for the loop exit value, R1D is not modified. Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D92803	2020-12-10 15:06:18 -06:00
Ilya Leoshkevich	d58f112ce0	Prevent FENTRY_CALL reordering FEntryInserter prepends FENTRY_CALL to the first basic block. In case there are other instructions, PostRA Machine Instruction Scheduler can move FENTRY_CALL call around. This actually occurs on SystemZ (see the testcase). This is bad for the following reasons: * FENTRY_CALL clobbers registers. * Linux Kernel depends on whatever FENTRY_CALL expands to to be the very first instruction in the function. Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering by making it a scheduling boundary for PostRA Machine Instruction Scheduler. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D91218	2020-12-09 00:59:01 +01:00
Layton Kifer	ac522f8700	[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1) Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets. Differential Revision: https://reviews.llvm.org/D91589	2020-12-06 11:52:10 -05:00
Fangrui Song	6b6c3aaeac	[test] Add explicit dso_local to function declarations in static relocation model tests They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies dso_local. For such function declarations, clang -fno-pic emits the dso_local specifier. Adding explicit dso_local makes these tests align with the clang behavior and helps implementing an option to use GOT indirection when taking the address of a function symbol in -fno-pic (to avoid a canonical PLT entry (SHN_UNDEF with non-zero st_value)).	2020-12-05 14:54:37 -08:00
Fangrui Song	2262b04cab	[test] Add explicit dso_local to constant/global variable declarations They are currently implicit because TargetMachine::shouldAssumeDSOLocal implies dso_local. For external data, clang -fno-pic emits the dso_local specifier for ELF and non-MinGW COFF. Adding explicit dso_local makes these tests in align with the clang behavior and helps implementing an option to use GOT indirection for external data access in -fno-pic mode (to avoid copy relocations).	2020-12-04 13:51:01 -08:00
Matt Arsenault	20c43d6bd5	OpaquePtr: Bulk update tests to use typed sret	2020-11-20 17:58:26 -05:00
Simon Pilgrim	7a8b2f692e	[DAGCombiner] Precommit Sext Tests for D91589 Patch by: @laytonio (Layton Kifer) Differential Revision: https://reviews.llvm.org/D91671	2020-11-18 15:56:16 +00:00
Simon Pilgrim	5a7be094e3	[SystemZ] Regenerate some fp tests + remove unused check prefixes Just use default CHECK	2020-11-11 18:38:22 +00:00
Jonas Paulsson	7c026a83ee	[SystemZ] Define MaxInstLength to have the value of 6. This value had the default value of 4 which caused branch relaxation to fail. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D90065	2020-10-24 09:19:34 +02:00
Craig Topper	9e884169a2	[FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes. There are still problems with unsigned i32 conversions too which I'll try to fix in another patch. Fixes part of PR47393 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87115	2020-10-21 18:12:54 -07:00
Jonas Paulsson	1606755da0	[SystemZ] Mark unsaved argument R6 as live throughout function. For historical reasons, the R6 register is a callee-saved argument register. This means that if it is used to pass an argument to a function that does not clobber it, it is live throughout the function. This patch makes sure that in this special case any kill flags of it are removed. Review: Ulrich Weigand, Eli Friedman Differential Revision: https://reviews.llvm.org/D89451	2020-10-21 14:38:59 +02:00
Jonas Paulsson	6756d43af9	[SystemZ] Bugfix in SystemZVectorConstantInfo In order to correctly load an all-ones FP NaN value into a floating point register with a VGBM, the analyzed 32/64 FP bits must first be shifted left (into element 0 of the vector register). SystemZVectorConstantInfo has so far relied on element replication which has bypassed the need to do this shift, but now it is clear that this must be done in order to handle NaNs. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D89389	2020-10-14 15:34:40 +02:00
Jonas Paulsson	d851495f2f	[SystemZ] Use LA instead of AGR in eliminateFrameIndex(). Since AGR clobbers CC it should not be used here. Fixes https://bugs.llvm.org/show_bug.cgi?id=47736. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D89034	2020-10-09 13:06:33 +02:00
Matt Arsenault	89baeaef2f	Reapply "RegAllocFast: Rewrite and improve" This reverts commit `73a6a164b8`.	2020-09-30 10:35:25 -04:00
Jonas Paulsson	75a5febe31	[SystemZ] Don't emit PC-relative memory accesses to unaligned symbols. In the presence of packed structures (#pragma pack(1)) where elements are referenced through pointers, there will be stores/loads with alignment values matching the default alignments for the element types while the elements are in fact unaligned. Strictly speaking this is incorrect source code, but is unfortunately part of existing code and therefore now addressed. This patch improves the pattern predicate for PC-relative loads and stores by not only checking the alignment value of the instruction, but also making sure that the symbol (and element) itself is aligned. Fixes https://bugs.llvm.org/show_bug.cgi?id=44405 Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D87510	2020-09-29 14:51:13 +02:00
Dávid Bolvanský	179e15d53a	[SystemZ] Optimize bcmp calls (PR47420) Solves https://bugs.llvm.org/show_bug.cgi?id=47420 Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87988	2020-09-25 17:55:39 +02:00
Muhammad Omair Javaid	73a6a164b8	Revert "Reapply Revert "RegAllocFast: Rewrite and improve"" This reverts commit `55f9f87da2`. Breaks following buildbots: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306 http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154	2020-09-22 14:40:06 +05:00
Matt Arsenault	55f9f87da2	Reapply Revert "RegAllocFast: Rewrite and improve" This reverts commit `dbd53a1f0c`. Needed lldb test updates	2020-09-21 15:45:27 -04:00
Eric Christopher	dbd53a1f0c	Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite. Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio This reverts commit `c8757ff3aa`.	2020-09-18 18:11:21 -07:00
Matt Arsenault	c8757ff3aa	RegAllocFast: Rewrite and improve This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details: Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics). Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured: average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count) Also adds a few testcases that were broken without this patch, in particular bug 47278. Patch mostly by Matthias Braun	2020-09-18 14:05:18 -04:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Jonas Paulsson	6dc3e22b57	[DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert. On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert() always ended up being scalarized, because the type action of the input is promotion which was previously an unhandled case in this method. This fixes https://bugs.llvm.org/show_bug.cgi?id=47132. Differential Revision: https://reviews.llvm.org/D86268 Patch by Eli Friedman. Review: Ulrich Weigand	2020-09-08 16:49:51 +02:00
Jonas Paulsson	714ceefad9	[SelectionDAG] Always intersect SDNode flags during getNode() node memoization. Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was in an undefined state, which is very bad given that this is the default when getNode() is called without passing an explicit SDNodeFlags argument. This meant that if an already existing and reused node had a flag which the second caller to getNode() did not set, that flag would remain uncleared. This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW flag was incorrectly set on an add instruction (which did in fact overflow in one of the two original contexts), so when SystemZElimCompare removed the compare with 0 trusting that flag, wrong-code resulted. There is more that needs to be done in this area as discussed here: Differential Revision: https://reviews.llvm.org/D86871 Review: Ulrich Weigand, Sanjay Patel	2020-09-05 10:30:38 +02:00
Dávid Bolvanský	0f14b2e6cb	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `50c743fa71`. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Dávid Bolvanský	50c743fa71	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 19:54:27 +02:00
Dávid Bolvanský	f9264995a6	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `44587e2f7e`. Sanitizer tests need to be updated.	2020-08-13 14:37:40 +02:00
Dávid Bolvanský	44587e2f7e	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 14:23:58 +02:00
Dávid Bolvanský	a0485421d2	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `385c9d673f`.	2020-08-13 12:59:15 +02:00
Dávid Bolvanský	385c9d673f	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 12:45:40 +02:00
Craig Topper	ffc248f3b8	[LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers. One of the callers only wants the condition, but the vselect can be simplified by getNode making it hard or impossible to retrieve the condition. Instead, return the condition and make the other 2 callers responsible for creating the vselect node using the condition. Rename the function to WidenVSELECTMask accordingly. Differential Revision: https://reviews.llvm.org/D85468	2020-08-06 13:18:16 -07:00
Ulrich Weigand	68a80a4436	[SystemZ] Ensure -mno-vx disables any use of vector features When passing the -vector feature to LLVM (or equivalently the -mno-vx command line argument to clang), the intent is that generated code must not use any vector features (in particular, no vector registers must be used). However, there are some cases where we still could generate such uses; these are all related to some of the additional vector features (like +vector-enhancements-1). Since none of those features are actually usable with -vector, just make sure we disable them all if -vector is given.	2020-07-23 15:34:59 +02:00
Ulrich Weigand	e9c6b63d4a	[SystemZ] Simplify knownbits.ll test The knownbits.ll test case is somewhat fragile since: - it relies on undef inputs; and - it operates just at the limits of the MaxRecursionDepth This means that optimization changes may easily cause the test to spuriously fail. Rewrite the test so it still validates the same thing, but in a less fragile manner.	2020-06-30 16:31:59 +02:00
Ilya Leoshkevich	6764869548	[SystemZ] Add NoMerge MIFlag Summary: This fixes ASan and MSan tests on SystemZ after commit `6a822e20ce` ("[ASan][MSan] Remove EmptyAsm and set the CallInst to nomerge to avoid from merging."). Based on commit `80e107ccd0` ("Add NoMerge MIFlag to avoid MIR branch folding"). Reviewers: uweigand, jonpa Reviewed By: uweigand Subscribers: hiraditya, llvm-commits, Andreas-Krebbel Tags: #llvm Differential Revision: https://reviews.llvm.org/D82794	2020-06-30 12:44:45 +02:00
Jonas Paulsson	ef7aad0db4	[SystemZ] Improve handling of ZERO_EXTEND_VECTOR_INREG. Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 -> v2i64), benchmarks have shown that it is better to do a VPERM (vector permute) since that is only one sequential instruction on the critical path. This patch achieves this by 1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector instead of (multiple) unpacks. 2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last operation if Bytes matches it. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78486	2020-06-30 09:08:10 +02:00
Fangrui Song	4cd19a6e15	[BasicAA] Rename -disable-basicaa to -disable-basic-aa to be consistent with the canonical name "basic-aa"	2020-06-26 20:55:44 -07:00
Jonas Paulsson	d3f7448e3c	[SystemZ] Bugfix in storeLoadCanUseBlockBinary(). Check that the MemoryVT of LoadA matches that of LoadB. This fixes https://bugs.llvm.org/show_bug.cgi?id=46239. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D81671	2020-06-17 09:49:31 +02:00
Sam Parker	09d30cb977	[CostModel] Unify Shuffle and InsertElement Costs Extract the existing code from getInstructionThroughput into TTImpl::getUserCost. The duplicated code in the AMDGPU backend has also been removed. Differential Revision: https://reviews.llvm.org/D81448	2020-06-10 09:13:34 +01:00
Jonas Paulsson	515bfc66ea	[SystemZ] Implement -fstack-clash-protection Probing of allocated stack space is now done when this option is passed. The purpose is to protect against the stack clash attack (see https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78717	2020-06-06 18:38:36 +02:00
Hans Wennborg	fcc199d696	Make regcoal_remat_empty_subrange.ll test require asserts build. The -stress-sched flag is only available when asserts are enabled.	2020-06-04 19:46:22 +02:00
Quentin Colombet	ccb3c8e861	[RegisterCoalescer] Update empty subranges when rematerializing When we rematerialize a value as part of the coalescing, we may widen the register class of the destination register. When this happens, updateRegDefUses may create additional subranges to account for the wider register class. The created subranges are empty and if they are not defined by the rematerialized instruction we clean them up. However, if they are defined by the rematerialized instruction but unused, we failed to flag them as dead definition and would leave them as empty live-range. This is wrong because empty live-ranges don't interfere with anything, thus if we don't fix them, we would fail to account that the rematerialized instruction clobbers some lanes. E.g., let us consider the following pseudo code: def.lane_low64:reg128 = ldimm newdef:reg32 = COPY def.lane_low64_low32 When rematerialization happens for newdef, we end up with: newdef.lane_low64:reg128 = ldimm = use newdef.lane_low64_low32 Let's look at the live interval of newdef. Before rematerialization, we would get: newdef [defIdx, useIdx:0) 0@defIdx Right after updateRegDefUses, newdef register class is widen to reg128 and the subrange definitions will be augmented to fill the subreg that is used at the definition point, here lane_low64. The resulting live interval would be: newdef [newDefIdx, useIdx:0) 0@newDefIdx * lane_low64_high32 EMPTY * lane_low64_low32 [newDefIdx, useIdx:0) Before this patch this would be the final status of the live interval. Therefore we miss that lane_low64_high32 is actually live on the definition point of newdef. With this patch, after rematerializing, we check all the added subranges and for the ones that are defined but empty, we flag them as dead def. Thus, in that case, newdef would look like this: newdef [newDefIdx, useIdx:0) 0@newDefIdx * lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY * lane_low64_low32 [newDefIdx, useIdx:0) This fixes https://www.llvm.org/PR46154	2020-06-03 17:10:55 -07:00
Kevin P. Neal	c21a4f84b0	Fix errors in use of strictfp attribute. Errors spotted with use of: https://reviews.llvm.org/D68233	2020-05-29 12:28:14 -04:00
Jonas Paulsson	b3bd0c37ec	[SystemZ] Eliminate the need to create a zero vector by reusing the VPERM mask. Try to avoid creating VGBMs by reusing the permutation mask if it contains a zero. If the first byte was into (any byte of) a zero vector, then the first byte of the mask can become zero and reused by putting the mask also as the first operand. If there instead was a first-byte use of the other source operand, then that zero index can be reused if the mask is placed as the second operand. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D79925	2020-05-19 09:37:19 +02:00

1 2 3 4 5 ...

712 Commits