llvm-project

Commit Graph

Author	SHA1	Message	Date
Konstantin Schwarz	f2fad3f703	[GlobalISel][InlineAsm] Add missing EarlyClobber flag to inline asm output operands Summary: Previously, we only added early-clobber flags to the 'group' immediate flag operand of an inline asm operand. However, we also have to add the EarlyClobber flag to the MachineOperand itself. This fixes PR46028 Reviewers: arsenm, leonardchan Reviewed By: arsenm, leonardchan Subscribers: phosek, wdng, rovka, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80467	2020-05-27 12:04:18 +02:00
Vitaly Buka	f6383643d9	[StackSafety] Bailout on some function calls Don't miss values used in calls outside regular argument list.	2020-05-27 02:48:42 -07:00
Vitaly Buka	06a07dd608	[StackSafety] Fix formatting in the test	2020-05-27 02:48:41 -07:00
Vitaly Buka	b101c6251a	[StackSafety] Ignore some use of values We should ignore value used in MemTransferInst as other then src/dst argument.	2020-05-27 02:48:41 -07:00
Kazushi (Jam) Marukawa	dedaf3a2ac	[VE] Dynamic stack allocation Summary: This patch implements dynamic stack allocation for the VE target. Changes: * compiler-rt: `__ve_grow_stack` to request stack allocation on the VE. * VE: base pointer support, dynamic stack allocation. Differential Revision: https://reviews.llvm.org/D79084	2020-05-27 10:11:06 +02:00
Daniil Suchkov	fc44da746f	Add test exposing a bug in SimpleLoopUnswitch.	2020-05-27 15:02:28 +07:00
Wang, Pengfei	6565b58584	[X86][llvm-mc] Make the suffix matcher more accurate. Summary: Some instruction like VPMULDQ is NOT the variant of VPMULD but a new one. So we should make sure the suffix matcher only works for memory variant that has the same size with the suffix. Currently we only check for SSE/AVX* instructions, because many legacy instructions didn't declare the alias instructions of their variants. Differential Revision: https://reviews.llvm.org/D80608	2020-05-27 14:45:17 +08:00
Vitaly Buka	32a1f60d11	[StackSafety] Use SCEV to find mem operation length	2020-05-26 23:22:37 -07:00
Vitaly Buka	d0f1f5adfa	[StackSafety] Use getSignedRange for offsets	2020-05-26 23:22:36 -07:00
Kang Zhang	23a2f45214	[NFC][PowerPC] Modify the test case two-address-crash.mir	2020-05-27 02:35:45 +00:00
Matt Arsenault	ef3e831226	GlobalISel: Basic legalization for G_PTRMASK	2020-05-26 21:20:30 -04:00
Vitaly Buka	b5ae70046b	[StackSafety] Simplify SCEVRewriteVisitor Probably NFC.	2020-05-26 18:09:43 -07:00
Jessica Paquette	ae597a771e	[AArch64][GlobalISel] Do not modify predicate when optimizing G_ICMP This fixes a bug in `tryOptArithImmedIntegerCompare`. It is unsafe to update the predicate on a MachineOperand when optimizing a G_ICMP, because it may be used in more than one place. For example, when we are optimizing G_SELECT, we allow compares which are used in more than one G_SELECT. If we modify the G_ICMP, then we'll break one of the G_SELECTs. Since the compare is being produced to either 1) Select a G_ICMP 2) Fold a G_ICMP into an instruction when profitable there's no reason to actually modify it. The change is local to the specific compare. Instead, pass a `CmpInst::Predicate` to `tryOptArithImmedIntegerCompare` which can be modified by reference. Differential Revision: https://reviews.llvm.org/D80585	2020-05-26 17:51:08 -07:00
Philip Reames	bed6624ac4	Split a test file so that most of it can be autogened	2020-05-26 17:33:32 -07:00
Philip Reames	b90eb0f23b	Autogen a couple of test files to make a future diff easier to read	2020-05-26 17:33:32 -07:00
Alexander Shaposhnikov	842a8cc10c	[llvm-objcopy][MachO] Add support for removing Swift symbols cctools strip has the option "-T" which removes Swift symbols. This diff implements this option in llvm-strip for MachO. Test plan: make check-all Differential revision: https://reviews.llvm.org/D80099	2020-05-26 16:49:56 -07:00
Arthur Eubanks	9a0b0855a9	Modify verifier checks to support musttail + preallocated Summary: preallocated and musttail can work together, but we don't want to call @llvm.call.preallocated.setup() to modify the stack in musttail calls. So we shouldn't have the "preallocated" operand bundle when a preallocated call is musttail. Also disallow use of preallocated on calls without preallocated. Codegen not yet implemented. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80581	2020-05-26 15:20:20 -07:00
Chris Jackson	9eacda51fa	[debuginfo] Fix broken tests from MachineLICM salvaging fix Previous commit: `bd7ff5d94f` - Added missing x86 triples - Added missing asserts	2020-05-26 22:46:07 +01:00
Vitaly Buka	6a74ad6baa	[sancov] Accommodate sancov and coverage report server for use under Windows Summary: This patch makes the following changes to SanCov and its complementary Python script in order to resolve issues pertaining to non-UNIX file paths in JSON symbolization information: * Convert all paths to use forward slash. * Update `coverage-report-server.py` to correctly handle paths to sources which contain spaces. * Remove Linux platform restriction for all SanCov unit tests. All SanCov tests passed when ran on my local Windows machine. Patch by Douglas Gliner. Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman Reviewed By: vitalybuka Subscribers: vsk, Dor1s, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D51018	2020-05-26 14:36:44 -07:00
Vedant Kumar	6e39379bbb	[DwarfExpression] Support entry values for indirect parameters Summary: A struct argument can be passed-by-value to a callee via a pointer to a temporary stack copy. Add support for emitting an entry value DBG_VALUE when an indirect parameter DBG_VALUE becomes unavailable. This is done by omitting DW_OP_stack_value from the entry value expression, to make the expression describe the location of an object. rdar://63373691 Reviewers: djtodoro, aprantl, dstenb Subscribers: hiraditya, lldb-commits, llvm-commits Tags: #lldb, #llvm Differential Revision: https://reviews.llvm.org/D80345	2020-05-26 14:22:28 -07:00
Stanislav Mekhanoshin	512e806a33	[AMDGPU] Bail alloca vectorization if GEP not found Differential Revision: https://reviews.llvm.org/D80587	2020-05-26 13:59:49 -07:00
Davide Italiano	01fee8aa24	[MLICM] Remove unneeded option so the test doesn't fail.	2020-05-26 13:53:56 -07:00
Matt Arsenault	bb10fa3a53	AMDGPU: Fix wrong null value for private address space I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before. Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.	2020-05-26 16:35:13 -04:00
Chris Jackson	bd7ff5d94f	[DebugInfo] Correct debuginfo for post-ra hoist and sink in Machine LICM Reviewers: vsk, aprantl Differential Revision: https://reviews.llvm.org/D79868	2020-05-26 21:07:10 +01:00
Stanislav Mekhanoshin	42725aeed8	Process gep (select ptr1, ptr2) in SROA Differential Revision: https://reviews.llvm.org/D79217	2020-05-26 12:56:02 -07:00
Sanjay Patel	1a2bffaf8b	[InstCombine] reassociate sub+add to increase adds and throughput The -reassociate pass tends to transform this kind of pattern into something that is worse for vectorization and codegen. See PR43953: https://bugs.llvm.org/show_bug.cgi?id=43953 Follows-up the FP version of the same transform: rGa0ce2338a083	2020-05-26 14:49:17 -04:00
Sanjay Patel	f5cfcc4b06	[LoopVectorize] regenerate full test checks; NFC	2020-05-26 14:49:17 -04:00
Sanjay Patel	0788392637	[InstCombine] add tests for reassociative sub/add expressions; NFC	2020-05-26 14:49:16 -04:00
Lei Huang	7eb666b155	[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm Summary: This patch simply adds support for the new CPU in anticipation of Power10. There isn't really any functionality added so there are no associated test cases at this time. Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc Reviewed By: stefanp, nemanjai, amyk, #powerpc Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo Tags: #clang, #powerpc, #llvm Differential Revision: https://reviews.llvm.org/D80020	2020-05-26 13:48:22 -05:00
Nemanja Ivanovic	6e9223a2c6	[PowerPC][NFC] Update test to prevent DCE from causing failures The test case provided in PR45709 can be simplified by DCE to an empty function. To prevent this from happening if DCE is run prior to ISEL in the back end, just add optnone to the function. The behaviour it is testing for is in the SDAG legalization and is not sensitive to optnone so the test case still achieves its desired objective.	2020-05-26 13:37:48 -05:00
Hiroshi Yamauchi	106ec64fbc	[PGO] Add memcmp/bcmp size value profiling. Summary: This adds support for memcmp/bcmp to the existing memcpy/memset value profiling. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79751	2020-05-26 10:28:04 -07:00
Sanjay Patel	a0ce2338a0	[InstCombine] reassociate fsub+fadd with FMF to increase adds and throughput The -reassociate pass tends to transform this kind of pattern into something that is worse for vectorization and codegen. See PR43953: https://bugs.llvm.org/show_bug.cgi?id=43953	2020-05-26 13:17:15 -04:00
Jonas Devlieghere	d4086213c6	[dsymutil] Escape CFBundleIdentifier in plist. Revision 333565 started escaping HTML special characters in the plist written by dsymutil, but didn't include the updated CFBundleIdentifier.	2020-05-26 09:38:32 -07:00
Sean Fertile	d6c8736287	[PowerPC][AIX] Spill CSRs to the ABI specified stack offsets. Extend the CSR save/restore insertion code to support both 32-bit and 64-bit AIX. Differential Revision: https://reviews.llvm.org/D79252	2020-05-26 12:24:29 -04:00
Matt Arsenault	50d4b22ca0	AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results I consider this to be a hack, since we probably should not mark any 16-bit extract as legal, and require all extracts to be done on multiples of 32. There are quite a few more battles to fight in the legalizer for sub-dword vectors, so just select this for now so we can pass OpenCL conformance without crashing. Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a trivial way to select this so just fail on it.	2020-05-26 12:14:08 -04:00
Matt Arsenault	8bc03d2168	GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic Confusingly, these were unrelated and had different semantics. The G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a different format. G_PTR_MASK only allows clearing the low bits of a pointer, and only a constant number of bits. The ptrmask intrinsic allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic. Only selects the cases that look like the old instruction. More work is needed to select the general case. Also new legalization code is still needed to deal with the case where the incoming mask size does not match the pointer size, which has a specified behavior in the langref.	2020-05-26 11:48:13 -04:00
Nemanja Ivanovic	099a875f28	[PowerPC] Unaligned FP default should apply to scalars only As reported in PR45186, we could be in a situation where we don't want to handle unaligned memory accesses for FP scalars but still have VSX (which allows unaligned access for vectors). Change the default to only apply to scalars. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186	2020-05-26 10:19:06 -05:00
Matt Arsenault	2dd7714b8d	AMDGPU/GlobalISel: Don't select boolean phi by default This is currently missing most of the hard parts to lower correctly, so disable it for now. This fixes at least one OpenCL conformance test and allows it to pass with fallback. Hide this behind an option for now.	2020-05-26 11:01:21 -04:00
Sam Parker	792575ff32	[NFC][ARM][AArch64] More code size tests Add analysis runs for icmp, fcmp and select instructions.	2020-05-26 14:47:02 +01:00
David Green	049c16ba93	[ARM] MVE VMINV/VMAXV test additions. NFC	2020-05-26 14:00:14 +01:00
Serge Pavlov	4d20e31f73	[FPEnv] Intrinsic llvm.roundeven This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven, and performs rounding to the nearest integer value, rounding halfway cases to even. The intrinsic represents the missed case of IEEE-754 rounding operations and now llvm provides full support of the rounding operations defined by the standard. Differential Revision: https://reviews.llvm.org/D75670	2020-05-26 19:24:58 +07:00
Sanjay Patel	f368040c14	[DAGCombiner] try to move splat after binop with splat constant binop (splat X), (splat C) --> splat (binop X, C) binop (splat C), (splat X) --> splat (binop C, X) We do this in IR, and there's a similar fold for the case with 2 non-constant operands just above the code diff in this patch. This was discussed in D79718, and the extra shuffle in the test (llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it was noticed disappears because demanded elements analysis is no longer blocked. The large majority of the test diffs seem to be benign code scheduling changes, but I do see another type of win: moving the splat later allows binop narrowing in some cases. Regressions were avoided on x86 and ARM with the INSERT_VECTOR_ELT restriction. Differential Revision: https://reviews.llvm.org/D79886	2020-05-26 08:12:46 -04:00
Simon Pilgrim	8b4639d0a0	[X86][AVX] Add some initial movmsk combine tests Show failure to reduce the signbit extraction for 256-bit integer vectors on AVX1 targets where the pcmpgt/ashr has to be done with split 128-bit vectors.	2020-05-26 10:55:57 +01:00
Georgii Rymar	2e365ca2f7	[DebugInfo/llvm-objdump] - Print "ZERO terminator" for terminator entries when dumping .eh_frame. A CIE with the Length == 0 is a terminator: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html And GNU objdump recognizes them and prints the following for such entries: "00000000 ZERO terminator" This patch teaches llvm-objdump to do the same. I had to update tests to use "CHECK-NEXT" too. (Note: it looks perhaps not right that printing is done inside the DebugInfo library, I'd expect to see the change in the llvm-objdump's code somewhere instead, but that is how it done atm). Differential revision: https://reviews.llvm.org/D80476	2020-05-26 12:52:42 +03:00
Sam Parker	c5bbc8dd6d	[NFC][ARM] Fix for previous commit Actually analyse code-size for the size runs...	2020-05-26 10:45:35 +01:00
Sam Parker	48cdbd081c	[NFC][ARM] Add code size analysis tests Add code size runs for the cast costs.	2020-05-26 10:30:43 +01:00
vpykhtin	92f3828dc5	[AMDGPU] Fix wait counts in the presence of 16bit subregisters Differential Revision: https://reviews.llvm.org/D80033	2020-05-26 12:19:27 +03:00
Georgii Rymar	2569787e44	[DebugInfo] - Fix multiple issues in DWARFDebugFrame::parse(). I've noticed an issue with "Data.getRelocatedValue(...)" call. it might silently ignore an error when a content is truncated. That leads to an infinite loop in the code (e.g. llvm-readobj hangs). After fixing the issue I've found that actually we always tried to read past the end of a section, even when a content was valid. It happened because the terminator CIE (a CIE with the length == 0) was never handled. At first I've tried just to stop adding the terminator entry (and return), but it does not seem to be correct, because tools like llvm-objdump might want to print something for such entries (see comments in the code and test cases). This patch fixes issues mentioned, provides new test cases for both llvm-readobj and lib/DebugInfo and adds FIXMEs to existent test cases related. Differential revision: https://reviews.llvm.org/D80299	2020-05-26 12:13:13 +03:00
Sam Parker	64cfb8a864	[NFC][ARM] Add intrinsic code size runs Add code size analysis of arithmetic intrinsics.	2020-05-26 09:41:54 +01:00
Sam Parker	1f72d5880e	[CostModel] Check for free intrinsics in BasicTTI Recommitting part of "[CostModel] Unify Intrinsic Costs." `de71def3f5` Now that the 'free' intrinsic information has been sunk to the lowest level, query the base implementation in BasicTTI before doing anything else. I suspect this is the change that was causing the main changes, particularly the large effects on debug builds. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 08:37:13 +01:00
Fangrui Song	872c5fb143	[AsmPrinter] Don't generate .Lfoo$local for -fno-PIC and -fPIE -fno-PIC and -fPIE code generally cannot be linked in -shared mode and there is no benefit accessing via local aliases. Actually, a .Lfoo$local reference will be converted to a STT_SECTION (if no section relaxation) reference which will cause the section symbol (sizeof(Elf64_Sym)=24) to be generated.	2020-05-25 23:35:49 -07:00
Kang Zhang	e6e89875b0	[NFC][PowerPC] Add a new case to test two-address verification	2020-05-26 06:14:08 +00:00
Fangrui Song	9d55e4ee13	Make explicit -fno-semantic-interposition (in -fpic mode) infer dso_local -fno-semantic-interposition is currently the CC1 default. (The opposite disables some interprocedural optimizations.) However, it does not infer dso_local: on most targets accesses to ExternalLinkage functions/variables defined in the current module still need PLT/GOT. This patch makes explicit -fno-semantic-interposition infer dso_local, so that PLT/GOT can be eliminated if targets implement local aliases for AsmPrinter::getSymbolPreferLocal (currently only x86). Currently we check whether the module flag "SemanticInterposition" is 0. If yes, infer dso_local. In the future, we can infer dso_local unless "SemanticInterposition" is 1: frontends other than clang will also benefit from the optimization if they don't bother setting the flag. (There will be risks if they do want ELF interposition: they need to set "SemanticInterposition" to 1.)	2020-05-25 20:48:18 -07:00
Nemanja Ivanovic	793cc518b9	[PowerPC] Prevent legalization loop from promoting SELECT_CC from v4i32 to v4i32 As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an infinite loop in legalization since we set the legalization action for ISD::SELECT_CC for all fixed length vector types to Promote. Without some different legalization action for the type being promoted to, the legalizer simply loops. Since we don't have patterns to match the node, the right legalization action should be Expand. Differential revision: https://reviews.llvm.org/D79854	2020-05-25 20:09:07 -05:00
Marek Kurdej	bc93c2d72e	[Transforms] Fix typos. NFC	2020-05-25 22:34:08 +02:00
Craig Topper	51a276c759	[X86] Teach combineTruncatedArithmetic to push truncate through subtracts where only one of the inputs is free to truncate. Fix combineSubToSubus to handle the new DAG to avoid a regression. There are still regressions in test14/test15/test16. Where it looks like were trying to set up cases we could match to umin+trunc+subus but the handling was never finished. The regression here isn't unique to sub. Its a lost opportunity for taking an AND with two truncated inputs and producing a larger AND with a single truncate. The same thing could happen with any other node we handle in combineTruncatedArithmetic since we are moving the truncate up the DAG. Differential Revision: https://reviews.llvm.org/D80483	2020-05-25 11:42:42 -07:00
Dmitry Preobrazhensky	77aec3b4c0	[AMDGPU][MC][GFX8+] Enabled clamp for v_add_u16, v_sub_u16 and v_subrev_u16 See https://bugs.llvm.org/show_bug.cgi?id=45926 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D80430	2020-05-25 19:55:38 +03:00
Simon Pilgrim	a6c4cd3bcb	[X86] Add PTEST tests showing failure to extract allsign cases As discussed on PR42035, we can often use MOVMSK to avoid a cmpgt/ashr by just analysing the extracted signbits.	2020-05-25 15:36:04 +01:00
Shuhong Liu	c8b7c73c57	Add AIX to the test macro-same-context XFAIL list Summary: Since the integrated assembly parser was not implemented yet for AIX and macro is not part of the native assembly dialect on AIX, the test macro-same-context is expected to fail for AIX; hence added AIX to XFAIL list. Reviewers: hubert.reinterpretcast, daltenty, jasonliu Reviewed By: daltenty Subscribers: jasonliu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80232	2020-05-25 10:19:45 -04:00
serge-sans-paille	8eae32188b	Improve stack-clash implementation on x86 - test both 32 and 64 bit version - probe the tail in dynamic-alloca - generate more concise code Differential Revision: https://reviews.llvm.org/D79482	2020-05-25 14:48:14 +02:00
Sanjay Patel	fa038e0350	[x86] favor vector constant load to avoid GPR to XMM transfer, part 2 This replaces the build_vector lowering code that was just added in D80013 and matches the pattern later from the x86-specific "vzext_movl". That seems to result in the same or better improvements and gets rid of the 'TODO' items from that patch. AFAICT, we always shrink wider constant vectors to 128-bit on these patterns, so we still get the implicit zero-extension to ymm/zmm without wasting space on larger vector constants. There's a trade-off there because that means we miss potential load-folding. Similarly, we could load scalar constants here with implicit zero-extension even to 128-bit. That saves constant space, but it means we forego load-folding, and so it increases register pressure. This seems like a good middle-ground between those 2 options. Differential Revision: https://reviews.llvm.org/D80131	2020-05-25 08:01:48 -04:00
David Green	9ff361b099	[ARM] VMULH tests for when other parts are working. NFC	2020-05-25 12:46:18 +01:00
Simon Pilgrim	9fa58d1bf2	[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies. This helps with PR40502. Differential Revision: https://reviews.llvm.org/D79003	2020-05-25 12:41:22 +01:00
Dmitry Preobrazhensky	b087b91c91	[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D78494	2020-05-25 14:23:34 +03:00
Ayal Zaks	840450549c	[LV] Clamp MaxVF to power of 2. If a loop has a constant trip count known to be a multiple of MaxVF (times user UF), LV infers that no tail will be generated for any chosen VF. This relies on the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power of 2. Make sure the latter holds, in particular when MaxVF is set by a memory dependence distance which may not be a power of 2. Differential Revision: https://reviews.llvm.org/D80491	2020-05-25 11:24:33 +03:00
Kazushi (Jam) Marukawa	5b7ff6f07f	[VE][NFC] Correct sjlj_expection test Summary: '\|&' works with bash only, so it should not be used in regression tests. Differential Revision: https://reviews.llvm.org/D80501	2020-05-25 09:49:37 +02:00
Fangrui Song	1b79509f97	[MCDwarf] Delete unneeded DW_AT_unspecified_parameters	2020-05-24 22:36:57 -07:00
Fangrui Song	20e9fc55fe	[MCDwarf] Delete unneeded DW_AT_prototyped for DW_TAG_label	2020-05-24 22:24:24 -07:00
Simon Pilgrim	8a5aea7b50	[X86][AVX] Fold extract_subvector(subv_broadcast(x),c) -> (x) If we're extracting an subvector from a broadcasted subvector of the same type then we can use the source vector directly.	2020-05-24 18:49:39 +01:00
Simon Pilgrim	e508d643cf	[X86][AVX] Fold extract_subvector(broadcast(x),c) -> extract_subvector(broadcast(x),0) iff c != 0 If we're extracting an upper subvector from a broadcast we're better off extracting the lowest subvector instead as it avoids an actual extract instruction and might help SimplifyDemandedVectorElts further simplify the code.	2020-05-24 18:05:54 +01:00
Sanjay Patel	57bb4787d7	[Pass Manager] remove EarlyCSE as clean-up for VectorCombine EarlyCSE was added with D75145, but the motivating test is not regressed by removing the extra pass now. That might be because VectorCombine altered the way it processes instructions, or it might be from (re)moving VectorCombine in the pipeline. The extra round of EarlyCSE appears to cost approximately 0.26% in compile-time as discussed in D80236, so we need some evidence to justify its inclusion here, but we do not have that (yet). I suspect that between SLP and VectorCombine, we are creating patterns that InstCombine and/or codegen are not prepared for, but we will need to reduce those examples and include them as PhaseOrdering and/or test-suite benchmarks.	2020-05-24 12:36:21 -04:00
Florian Hahn	0deab8a54f	[LV] Either get invariant condition OR vector condition. Currently we unconditionally get the first lane of the condition operand, even if we later use the full vector condition. This can result in some unnecessary instructions being generated. Suggested as follow-up in D80219.	2020-05-24 17:16:42 +01:00
Sanjay Patel	d43fac052e	[PhaseOrdering] adjust test to use default alias analysis with new pass manager; NFC As discussed in D80236 - this test (like all PhaseOrdering tests?) was intended to show that there is no difference with the new pass manager, but the 'opt' command requires extra parameters to make that happen.	2020-05-24 11:28:15 -04:00
Simon Pilgrim	1e7865d946	[X86] SimplifyMultipleUseDemandedBitsForTargetNode - add initial X86ISD::VSRAI handling. This initial version only peeks through cases where we just demand the sign bit of an ashr shift, but we could generalize this further depending on how many sign bits we already have. The pr18014.ll case is a minor annoyance - we've failed to to move the psrad/paddd after the blendvps which would have avoided the extra move, but we have still increased the ILP.	2020-05-24 16:07:46 +01:00
Kang Zhang	86e3abc9e6	[PowerPC] Add some InstAlias definitions Summary: This patch add the InstAlias definitions for below instructions. ADDI ADDIS ADDI8 ADDIS8 RLWINM8 ISEL ISEL8 OR OR_rec ORI ORI8 XORI8 CNTLZW8 CNTLZW8_rec TEND TSR RFEBB NOR NOR_rec MTCRF SUBF SUBF_rec SUBFC SUBFC_rec RLDICL_32_64 TW Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D77559	2020-05-24 14:05:28 +00:00
Sanjay Patel	c048a02b5b	[InstCombine] fold FP trunc into exact itofp Similar to D79116 and rGbfd512160fe0 - if the 1st cast is exact, then we can go directly to the destination type because there is no double-rounding.	2020-05-24 09:30:19 -04:00
Simon Pilgrim	1603106725	[TargetLowering] Improve expandFunnelShift shift amount masking For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount. But for the case where we are handling power-of-2 type widths, we can replace: (sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1) This allows x86 shifts to fold away the and-mask. Followup to D77301 + D80466. http://volta.cs.utah.edu:8080/z/Nod0Gr Differential Revision: https://reviews.llvm.org/D80489	2020-05-24 11:25:09 +01:00
Simon Pilgrim	8310c9b741	[X86][AVX] Call SimplifyDemandedBits on MaskedLoadSDNode with non-boolean masks On X86 (AVX1/AVX2), non-boolean masked loads only demand the sign bit of the mask, we already do the equivalent for masked stores. Annoyingly I can't easily handle this inside TargetLowering::SimplifyDemandedBits as this is an x86 specific case for a generic node. Differential Revision: https://reviews.llvm.org/D80478	2020-05-24 09:51:21 +01:00
Simon Pilgrim	cc65a7a5ea	[X86] Improve i8 + 'slow' i16 funnel shift codegen This is a preliminary patch before I deal with the xor+and issue raised in D77301. We get much better code for i8/i16 funnel shifts by concatenating the operands together and performing the shift as a double width type, it avoids repeated use of the shift amount and partial registers. fshl(x,y,z) -> (((zext(x) << bw) \| zext(y)) << (z & (bw-1))) >> bw. fshr(x,y,z) -> (((zext(x) << bw) \| zext(y)) >> (z & (bw-1))) >> bw. Alive2: http://volta.cs.utah.edu:8080/z/CZx7Cn This doesn't do as well for i32 cases on x86_64 (the xor+and followup patch is much better) so I haven't bothered with that. Cases with constant amounts are more dubious as well so I haven't currently bothered with those - its these kind of 'edge' cases that put me off trying to put this in TargetLowering::expandFunnelShift. Differential Revision: https://reviews.llvm.org/D80466	2020-05-24 08:08:53 +01:00
Amara Emerson	99660217e9	[AArch64][GlobalISel] When generating SUBS for compares, don't write to wzr/xzr. Although writing to wzr/xzr is correct since we don't care about the result of the sub, only the flags, doing so causes tail merge blocks to fail. Writing to an unused virtual register instead allows the optimization to fire, improving performance significantly on 256.bzip2. Differential Revision: https://reviews.llvm.org/D80460	2020-05-23 22:59:49 -07:00
Amy Kwan	b631f86ac5	[TLI][PowerPC] Introduce TLI query to check if MULH is cheaper than MUL + SHIFT This patch introduces a TargetLowering query, isMulhCheaperThanMulShift. Currently in DAG Combine, it will transform mulhs/mulhu into a wider multiply and a shift if the wide multiply is legal. This TLI function is implemented on 64-bit PowerPC, as it is more desirable to have multiply-high over multiply + shift for words and doublewords. Having multiply-high can also aid in further transformations that can be done. Differential Revision: https://reviews.llvm.org/D78271	2020-05-23 16:47:12 -05:00
Nikita Popov	2833c46f75	[DwarfEHPrepare] Don't prune unreachable resumes at optnone Disable pruning of unreachable resumes in the DwarfEHPrepare pass at optnone. While I expect the pruning itself to be essentially free, this does require a dominator tree calculation, that is not used for anything else. Saving this DT construction makes for a 0.4% O0 compile-time improvement. Differential Revision: https://reviews.llvm.org/D80400	2020-05-23 20:58:01 +02:00
Matt Arsenault	27fe841aa6	AMDGPU: Refine rcp/rsq intrinsic folding for modern FP rules We have to assume undef could be an snan, which would need quieting so returning qnan is safer than undef. Also consider strictfp, and don't care if the result rounded.	2020-05-23 13:28:36 -04:00
Georgii Rymar	38c5d6f700	[yaml2obj] - Add a technical prefix for each unnamed chunk. This change does not affect the produced binary. In this patch I assign a technical suffix to each section/fill (i.e. chunk) name when it is empty. It allows to simplify the code slightly and improve error messages reported. In the code we have the section to index mapping, SN2I, which is globally used. With this change we can use it to map "empty" names to indexes now, what is helpful. Differential revision: https://reviews.llvm.org/D79984	2020-05-23 17:22:23 +03:00
Michal Paszkowski	335de55fa3	Revert "Added a new IRCanonicalizer pass." This reverts commit `14d358537f`.	2020-05-23 13:51:43 +02:00
Michal Paszkowski	14d358537f	Added a new IRCanonicalizer pass. Summary: Added a new IRCanonicalizer pass which aims to transform LLVM modules into a canonical form by reordering and renaming instructions while preserving the same semantics. The canonicalizer makes it easier to spot semantic differences when diffing two modules which have undergone different passes. Presentation: https://www.youtube.com/watch?v=c9WMijSOEUg Reviewed by: plotfi Differential Revision: https://reviews.llvm.org/D66029	2020-05-23 12:45:53 +02:00
Nikita Popov	0c6bba71e3	[TargetPassConfig] Don't add alias analysis at optnone When performing codegen at optnone, don't add alias analysis to the pipeline. We don't need it, but it causes an unnecessary dominator tree calculation. I've also moved the module verifier call to the top so that a bunch of disabled-at-optnone passes group more nicely. Differential Revision: https://reviews.llvm.org/D80378	2020-05-23 10:35:03 +02:00
Stanislav Mekhanoshin	62fb3fa6d9	[AMDGPU] Define 6 dword subregs This prevents autogeneration of degenerate names for these. Differential Revision: https://reviews.llvm.org/D80451	2020-05-22 13:53:29 -07:00
Sanjay Patel	024098ae53	[VectorCombine] set preserve alias analysis As noted in D80236, moving the pass in the pipeline exposed this shortcoming. Extra work to recalculate the alias results showed up as a compile-time slowdown.	2020-05-22 16:25:16 -04:00
Ahsan Saghir	a28e9f1208	[PowerPC] Add support for vmsumudm This patch adds support for Vector Multiply-Sum Unsigned Doubleword Modulo instruction; vmsumudm. Differential Revision: https://reviews.llvm.org/D80294	2020-05-22 14:35:13 -05:00
Jean-Michel Gorius	65cd2c7a80	Revert "[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias" This temporarily reverts commit `7019cea26d`. It seems that, for some targets, there are instructions with a lot of memory operands (probably more than would be expected). This causes a lot of buildbots to timeout and notify failed builds. While investigations are ongoing to find out why this happens, revert the changes.	2020-05-22 21:26:46 +02:00
Florian Hahn	7a325c14b4	[DSE,MSSA] Add additional multiblock tests.	2020-05-22 18:24:43 +01:00
Sanjay Patel	6438ea45e0	[VectorCombine] position pass after SLP in the optimization pipeline rather than before There are 2 known problem patterns shown in the test diffs here: vector horizontal ops (an x86 specialization) and vector reductions. SLP has greater ability to match and fold those than vector-combine, so let SLP have first chance at that. This is a quick fix while we continue to improve vector-combine and possibly canonicalize to reduction intrinsics. In the longer term, we should improve matching of these patterns because if they were created in the "bad" forms shown here, then we would miss optimizing them. I'm not sure what is happening with alias analysis on the addsub test. The old pass manager now shows an extra line for that, and we see an improvement that comes from SLP vectorizing a store. I don't know what's missing with the new pass manager to make that happen. Strangely, I can't reproduce the behavior if I compile from C++ with clang and invoke the new PM with "-fexperimental-new-pass-manager". Differential Revision: https://reviews.llvm.org/D80236	2020-05-22 12:22:44 -04:00
Simon Pilgrim	c479052a74	[CGP] Ensure address offset is representable as int64_t AddressingModeMatcher::matchAddr was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t Fixes PR46004 / OSSFuzz#22357	2020-05-22 17:00:22 +01:00
Sanjay Patel	2f7c24fe30	[InstCombine] (A + B) + B --> A + (B << 1) This eliminates a use of 'B', so it can enable follow-on transforms as well as improve analysis/codegen. The PhaseOrdering test was added for D61726, and that shows the limits of instcombine vs. real reassociation. We would need to run some form of CSE to collapse that further. The intermediate variable naming here is intentional because there's a test at llvm/test/Bitcode/value-with-long-name.ll that would break with the usual nameless value. I'm not sure how to improve that test to be more robust. The naming may also be helpful to debug regressions if this change exposes weaknesses in the reassociation pass for example.	2020-05-22 11:46:59 -04:00
Sanjay Patel	b603794061	[InstCombine] add tests for adds with common operand; NFC	2020-05-22 11:46:59 -04:00
Denis Antrushin	5451289aba	[SCEV] Constant fold MultExpr before applying depth limit. Summary: Users of SCEV reasonably assume that multiplication of two constant SCEVs will in turn be constant. However, that is not always the case: First, we can get here with reached depth limit, and will create MultExpr SCEV `C1 * C2` and cache it. Then, we can get here with the same operands, but with small depth level. But this time we will find existing MultExpr SCEV and return it, instead of expected constant SCEV. This patch changes getMultExpr to not apply depth limit to all constant operands expression, allowing them to be folded. Reviewers: reames, mkazantsev Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79893	2020-05-22 18:34:32 +03:00
Xiangling_Liao	2419dce5d1	[NFC][AIX] Remove spaces after the comma for '.csect' directive To be consistent with other directives like '.comm', '.lcomm', we remove the spaces after the comma for '.csect' on AIX. Differential Revision: https://reviews.llvm.org/D80247	2020-05-22 11:10:32 -04:00
Matt Arsenault	66fe60220c	AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks Unlike SelectionDAGBuilder, IRTranslator omits the unconditional branch in fallthrough cases. Confusingly, the control flow pseudos function in the opposite way the intrinsics are used, and the branch targets always need to be swapped. We're inverting the target blocks, so we need to figure out the old fallthrough block and insert a branch to the original unconditional branch target.	2020-05-22 10:31:44 -04:00
Sanjay Patel	5a230a19ad	[PhaseOrdering] regenerate test checks; NFC Remove some redundant/unnecessary bits too.	2020-05-22 10:10:47 -04:00

1 2 3 4 5 ...

71565 Commits