llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	55f9f87da2	Reapply Revert "RegAllocFast: Rewrite and improve" This reverts commit `dbd53a1f0c`. Needed lldb test updates	2020-09-21 15:45:27 -04:00
Eric Christopher	dbd53a1f0c	Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite. Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio This reverts commit `c8757ff3aa`.	2020-09-18 18:11:21 -07:00
Matt Arsenault	c8757ff3aa	RegAllocFast: Rewrite and improve This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details: Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics). Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured: average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count) Also adds a few testcases that were broken without this patch, in particular bug 47278. Patch mostly by Matthias Braun	2020-09-18 14:05:18 -04:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Jonas Paulsson	6dc3e22b57	[DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert. On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert() always ended up being scalarized, because the type action of the input is promotion which was previously an unhandled case in this method. This fixes https://bugs.llvm.org/show_bug.cgi?id=47132. Differential Revision: https://reviews.llvm.org/D86268 Patch by Eli Friedman. Review: Ulrich Weigand	2020-09-08 16:49:51 +02:00
Jonas Paulsson	714ceefad9	[SelectionDAG] Always intersect SDNode flags during getNode() node memoization. Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was in an undefined state, which is very bad given that this is the default when getNode() is called without passing an explicit SDNodeFlags argument. This meant that if an already existing and reused node had a flag which the second caller to getNode() did not set, that flag would remain uncleared. This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW flag was incorrectly set on an add instruction (which did in fact overflow in one of the two original contexts), so when SystemZElimCompare removed the compare with 0 trusting that flag, wrong-code resulted. There is more that needs to be done in this area as discussed here: Differential Revision: https://reviews.llvm.org/D86871 Review: Ulrich Weigand, Sanjay Patel	2020-09-05 10:30:38 +02:00
Dávid Bolvanský	0f14b2e6cb	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `50c743fa71`. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Dávid Bolvanský	50c743fa71	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 19:54:27 +02:00
Dávid Bolvanský	f9264995a6	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `44587e2f7e`. Sanitizer tests need to be updated.	2020-08-13 14:37:40 +02:00
Dávid Bolvanský	44587e2f7e	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 14:23:58 +02:00
Dávid Bolvanský	a0485421d2	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `385c9d673f`.	2020-08-13 12:59:15 +02:00
Dávid Bolvanský	385c9d673f	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 12:45:40 +02:00
Craig Topper	ffc248f3b8	[LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers. One of the callers only wants the condition, but the vselect can be simplified by getNode making it hard or impossible to retrieve the condition. Instead, return the condition and make the other 2 callers responsible for creating the vselect node using the condition. Rename the function to WidenVSELECTMask accordingly. Differential Revision: https://reviews.llvm.org/D85468	2020-08-06 13:18:16 -07:00
Ulrich Weigand	68a80a4436	[SystemZ] Ensure -mno-vx disables any use of vector features When passing the -vector feature to LLVM (or equivalently the -mno-vx command line argument to clang), the intent is that generated code must not use any vector features (in particular, no vector registers must be used). However, there are some cases where we still could generate such uses; these are all related to some of the additional vector features (like +vector-enhancements-1). Since none of those features are actually usable with -vector, just make sure we disable them all if -vector is given.	2020-07-23 15:34:59 +02:00
Ulrich Weigand	e9c6b63d4a	[SystemZ] Simplify knownbits.ll test The knownbits.ll test case is somewhat fragile since: - it relies on undef inputs; and - it operates just at the limits of the MaxRecursionDepth This means that optimization changes may easily cause the test to spuriously fail. Rewrite the test so it still validates the same thing, but in a less fragile manner.	2020-06-30 16:31:59 +02:00
Ilya Leoshkevich	6764869548	[SystemZ] Add NoMerge MIFlag Summary: This fixes ASan and MSan tests on SystemZ after commit `6a822e20ce` ("[ASan][MSan] Remove EmptyAsm and set the CallInst to nomerge to avoid from merging."). Based on commit `80e107ccd0` ("Add NoMerge MIFlag to avoid MIR branch folding"). Reviewers: uweigand, jonpa Reviewed By: uweigand Subscribers: hiraditya, llvm-commits, Andreas-Krebbel Tags: #llvm Differential Revision: https://reviews.llvm.org/D82794	2020-06-30 12:44:45 +02:00
Jonas Paulsson	ef7aad0db4	[SystemZ] Improve handling of ZERO_EXTEND_VECTOR_INREG. Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 -> v2i64), benchmarks have shown that it is better to do a VPERM (vector permute) since that is only one sequential instruction on the critical path. This patch achieves this by 1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector instead of (multiple) unpacks. 2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last operation if Bytes matches it. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78486	2020-06-30 09:08:10 +02:00
Fangrui Song	4cd19a6e15	[BasicAA] Rename -disable-basicaa to -disable-basic-aa to be consistent with the canonical name "basic-aa"	2020-06-26 20:55:44 -07:00
Jonas Paulsson	d3f7448e3c	[SystemZ] Bugfix in storeLoadCanUseBlockBinary(). Check that the MemoryVT of LoadA matches that of LoadB. This fixes https://bugs.llvm.org/show_bug.cgi?id=46239. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D81671	2020-06-17 09:49:31 +02:00
Sam Parker	09d30cb977	[CostModel] Unify Shuffle and InsertElement Costs Extract the existing code from getInstructionThroughput into TTImpl::getUserCost. The duplicated code in the AMDGPU backend has also been removed. Differential Revision: https://reviews.llvm.org/D81448	2020-06-10 09:13:34 +01:00
Jonas Paulsson	515bfc66ea	[SystemZ] Implement -fstack-clash-protection Probing of allocated stack space is now done when this option is passed. The purpose is to protect against the stack clash attack (see https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78717	2020-06-06 18:38:36 +02:00
Hans Wennborg	fcc199d696	Make regcoal_remat_empty_subrange.ll test require asserts build. The -stress-sched flag is only available when asserts are enabled.	2020-06-04 19:46:22 +02:00
Quentin Colombet	ccb3c8e861	[RegisterCoalescer] Update empty subranges when rematerializing When we rematerialize a value as part of the coalescing, we may widen the register class of the destination register. When this happens, updateRegDefUses may create additional subranges to account for the wider register class. The created subranges are empty and if they are not defined by the rematerialized instruction we clean them up. However, if they are defined by the rematerialized instruction but unused, we failed to flag them as dead definition and would leave them as empty live-range. This is wrong because empty live-ranges don't interfere with anything, thus if we don't fix them, we would fail to account that the rematerialized instruction clobbers some lanes. E.g., let us consider the following pseudo code: def.lane_low64:reg128 = ldimm newdef:reg32 = COPY def.lane_low64_low32 When rematerialization happens for newdef, we end up with: newdef.lane_low64:reg128 = ldimm = use newdef.lane_low64_low32 Let's look at the live interval of newdef. Before rematerialization, we would get: newdef [defIdx, useIdx:0) 0@defIdx Right after updateRegDefUses, newdef register class is widen to reg128 and the subrange definitions will be augmented to fill the subreg that is used at the definition point, here lane_low64. The resulting live interval would be: newdef [newDefIdx, useIdx:0) 0@newDefIdx * lane_low64_high32 EMPTY * lane_low64_low32 [newDefIdx, useIdx:0) Before this patch this would be the final status of the live interval. Therefore we miss that lane_low64_high32 is actually live on the definition point of newdef. With this patch, after rematerializing, we check all the added subranges and for the ones that are defined but empty, we flag them as dead def. Thus, in that case, newdef would look like this: newdef [newDefIdx, useIdx:0) 0@newDefIdx * lane_low64_high32 [newDefIdx, newDefIdxDead) ; <-- instead of EMPTY * lane_low64_low32 [newDefIdx, useIdx:0) This fixes https://www.llvm.org/PR46154	2020-06-03 17:10:55 -07:00
Kevin P. Neal	c21a4f84b0	Fix errors in use of strictfp attribute. Errors spotted with use of: https://reviews.llvm.org/D68233	2020-05-29 12:28:14 -04:00
Jonas Paulsson	b3bd0c37ec	[SystemZ] Eliminate the need to create a zero vector by reusing the VPERM mask. Try to avoid creating VGBMs by reusing the permutation mask if it contains a zero. If the first byte was into (any byte of) a zero vector, then the first byte of the mask can become zero and reused by putting the mask also as the first operand. If there instead was a first-byte use of the other source operand, then that zero index can be reused if the mask is placed as the second operand. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D79925	2020-05-19 09:37:19 +02:00
Jonas Paulsson	31ecef7627	[SystemZ] Don't create PERMUTE nodes with an undef operand. It's better to reuse the first source value than to use an undef second operand, because that will make more resulting VPERMs have identical operands and therefore MachineCSE more successful. Review: Ulrich Weigand	2020-05-18 19:42:14 +02:00
Jonas Paulsson	57feff93a8	[SystemZ] Improve foldMemoryOperandImpl: vec->FP conversions Use FP-mem instructions when folding reloads into single lane (W..) vector instructions. Only do this when all other operands of the instruction have already been allocated to an FP (F0-F15) register. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76705	2020-05-12 09:21:24 +02:00
Ulrich Weigand	947f78ac27	[SystemZ] Fix/optimize vec_load_len and related intrinsics When using vec_load/store_len_r with an immediate length operand of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction with that immediate. This creates a valid encoding (which should be supported by the assembler), but always traps at runtime. This patch fixes this by not creating VLRL/VSTRL in those cases. This would result in loading the length into a register and calling VLRLR/VSTRLR instead. However, these operations with a length of 15 or larger are in fact simply equivalent to a full vector load or store. And in fact the same holds true for vec_load/store_len as well. Therefore, add a DAGCombine rule to replace those operations with plain vector loads or stores if the length is known at compile time and equal or larger to 15.	2020-05-06 21:15:58 +02:00
Jonas Paulsson	c84461ba8d	[SystemZ] Fix test case. Remove bad kill flags fom load-and-test.mir as discovered by https://reviews.llvm.org/D78586: "[MachineVerifier] Add more checks for registers in live-in lists". Review: Ulrich Weigand	2020-04-28 09:43:03 +02:00
Jonas Paulsson	036242b868	[SystemZ] Bugfix in adjustSubwordCmp() adjustSubwordCmp() should not optimize a load of an i1 value. This is achieved by checking that the size and store-size of the MemoryVT are the same. Fixes https://bugs.llvm.org/show_bug.cgi?id=45511. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78187	2020-04-15 12:58:39 +02:00
Jonas Paulsson	36d4421f50	[LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop. This patch adds - New arguments to getMinPrefetchStride() to let the target decide on a per-loop basis if software prefetching should be done even with a stride within the limit of the hw prefetcher. - New TTI hook enableWritePrefetching() to let a target do write prefetching by default (defaults to false). - In LoopDataPrefetch: - A search through the whole loop to gather information before emitting any prefetches. This way the target can get information via new arguments to getMinPrefetchStride() and emit prefetches more selectively. Collected information includes: Does the loop have a call, how many memory accesses, how many of them are strided, how many prefetches will cover them. This is NFC to before as long as the target does not change its definition of getMinPrefetchStride(). - If a previous access to the same exact address was 'read', and the current one is 'write', make it a 'write' prefetch. - If two accesses that are covered by the same prefetch do not dominate each other, put the prefetch in a block that dominates both of them. - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop. - A SystemZ implementation of getMinPrefetchStride(). Review: Ulrich Weigand, Michael Kruse Differential Revision: https://reviews.llvm.org/D70228	2020-04-02 14:57:46 +02:00
Jonas Paulsson	f481d48893	[SystemZ] Improve foldMemoryOperandImpl(). Fold MS(G)RKC -> MS(G)C. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76771	2020-03-31 17:17:51 +02:00
Jonas Paulsson	f09b891d4a	[SystemZ] Improve foldMemoryOperandImpl() A spilled load of an immediate can use MVHI/MVGHI instead. A compare of a spilled register against an immediate can use CHSI/CGHSI. A logical compare can use CLFHSI/CLGHSI. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76055	2020-03-25 16:21:08 +01:00
Jinsong Ji	816ad48c82	[NFC][RUIP] Small debug output refine Add a new line, so that we always print MI in a new line, before and after UpdateRegMask, for easier check..	2020-03-24 03:29:45 +00:00
Jonas Paulsson	9adc7fc3cd	[SystemZ] Perform instruction shortening for fused fp ops. Replace single-lane (W... form) vector "multiply and add" and "multiply and subtract" instructions with equivalent floating point instructions whenever possible in SystemZShortenInst. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D76370	2020-03-23 14:12:13 +01:00
Simon Pilgrim	68224c1952	[TargetLowering] Only demand a rotation's modulo amount bits ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits. Differential Revision: https://reviews.llvm.org/D76201	2020-03-17 21:23:46 +00:00
Jonas Paulsson	132f25bcca	[SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes. The type legalizer will scalarize vector conversions from integer to floating point if the source element size is less than that of the result. This is avoided now by inserting a zero/sign-extension of the source vector before type legalization. Review: Ulrich Weigand Differential revision: https://reviews.llvm.org/D75978	2020-03-16 13:07:42 +01:00
Simon Pilgrim	775bf62698	[SystemZ] Regenerate rotate/shift tests	2020-03-15 16:42:46 +00:00
Jonas Paulsson	62ff9960d3	[SystemZ] Improve foldMemoryOperandImpl(). Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users. This is relatively straight forward since the live-in lists for the CC register can be assumed to be correct during register allocation (thanks to `659efa2`). Also fold a spilled operand of an LOCR/SELR into an LOC(G). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D67437	2020-03-10 15:54:47 +01:00
Jonas Paulsson	ae4d39c9e4	[SystemZ] Copy Access registers and CC with the correct register class. On SystemZ there are a set of "access registers" that can be copied in and out of 32-bit GPRs with special instructions. These instructions can only perform the copy using low 32-bit parts of the 64-bit GPRs. However, the default register class for 32-bit integers is GRX32, which also contains the high 32-bit part registers. In order to never end up with a case of such a COPY into a high reg, this patch adds a new simple pre-RA pass that selects such COPYs into target instructions. This pass also handles COPYs from CC (Condition Code register), and COPYs to CC can now also be emitted from a high reg in copyPhysReg(). Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254 Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D75014	2020-03-03 16:41:09 +01:00
Jonas Paulsson	237625757a	[SystemZ] Bugfix for backchain with packed-stack The incoming back chain slot was implicitly allocated whenever a GPR was saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no GPRs were saved/restored this did not take effect. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75367	2020-03-03 15:03:01 +01:00
Jonas Paulsson	cdcce3cabf	[SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp(). Forming subtract with overflow is beneficial on SystemZ, just like additions. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75290	2020-03-03 14:38:57 +01:00
Craig Topper	a5fa778882	[LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad Should fix PR42803 and PR44902 Differential Revision: https://reviews.llvm.org/D74590	2020-02-24 15:14:33 -08:00
Jonas Paulsson	82879c2913	[SystemZ] Support the kernel back chain. In order to build the Linux kernel, the back chain must be supported with packed-stack. The back chain is then stored topmost in the register save area. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74506	2020-02-23 13:42:36 -08:00
Jonas Paulsson	0eddeeab29	[ValueTracking] Improve isKnownNonNaN() to recognize zero splats. isKnownNonNaN() could not recognize a zero splat because that is a ConstantAggregateZero which is-a ConstantData but not a ConstantDataVector. Patch makes a ConstantAggregateZero return true. Review: Thomas Lively Differential Revision: https://reviews.llvm.org/D74263	2020-02-19 09:35:36 -08:00
Simon Pilgrim	7a554270c0	[SystemZ] Regenerate risbg tests. NFCI. Pre-commit for some upcoming SimplifyDemandedBits bitrotate handling.	2020-02-19 16:39:28 +00:00
James Clarke	b3cd44f80b	Use SETNE directly rather than SUB/SETNE 0 for stack guard check Summary: Backends should fold the subtraction into the comparison, but not all seem to. Moreover, on targets where pointers are not integers, such as CHERI, an integer subtraction is not appropriate. Instead we should just compare the two pointers directly, as this should work everywhere and potentially generate more efficient code. Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish Reviewed By: lebedev.ri Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74454	2020-02-18 13:21:26 +00:00
Fangrui Song	6b14814e10	[AsmPrinter] Omit unique ID for .stack_sizes Follow-up for D74006.	2020-02-14 21:25:06 -08:00
Yuanfang Chen	4ad7685258	Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""" This reverts commit `80a34ae311` with fixes. Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on MachineVerifierPass by default on X86 and the fact that inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll are not expected to generate functioning machine code, this would go down to `report_fatal_error` in MachineVerifierPass. Here passing `-verify-machineinstrs=0` to make the intent explicit.	2020-02-13 10:16:06 -08:00
Yuanfang Chen	17122ec10a	Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""" This reverts commit `bb51d24330`.	2020-02-13 10:08:05 -08:00

1 2 3 4 5 ...

687 Commits