llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	23a116c8c4	[InstCombine] convert lshr to ashr to eliminate cast op This is similar to `b865eead76` ( D103617 ) and fixes: https://llvm.org/PR50575 `41b71f718b` did this and more (noted with TODO comments in the tests), but it didn't handle the case where the destination is narrower than the source, so it got reverted. This is a simple match-and-replace. If there's evidence that the TODO cases are useful, we can revisit/extend.	2021-06-04 07:04:37 -04:00
Jeremy Morse	4501928eb2	Re-land `ae4303b42c`, "Track PHI values through register coalescing" Was reverted in `0507fc2ffc`, in phi-coalesce-subreg.mir I'd explicitly named some passes to run instead of specifying a range. As a result some two-address-instrs weren't correctly rewritten and the verifier got upset. Original commit message: [DebugInstrRef][2/3] Track PHI values through register coalescing In the instruction referencing variable location model, we store variable locations that point at PHIs in MachineFunction during register allocation. Unfortunately, register coalescing can substantially change the locations of registers, and so that PHI-variable-location side table needs maintenence during the pass. This patch builds an index from the side table, and whenever a vreg gets coalesced into another vreg, update the index to record the new vreg that the PHI happens in. It also accepts a limited range of subregister coalescing, for example merging a subregister into a larger class. Differential Revision: https://reviews.llvm.org/D86813	2021-06-04 11:32:02 +01:00
Fraser Cormack	aec9cbbeb8	[SelectionDAG] Extend FoldConstantVectorArithmetic to SPLAT_VECTOR This patch extends the SelectionDAG's ability to constant-fold vector arithmetic to include support for SPLAT_VECTOR. This is not only for scalable-vector types but also for fixed-length vector types, which helps Hexagon in a couple of cases. The original RISC-V test case was in fact an infinite DAGCombine loop. The pattern `and (truncate v1), (truncate v2)` can be combined to `truncate (and v1, v2)` but the truncate can similarly be combined back to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or `v2` is a constant vector). It wasn't exposed in on fixed-length types because a TRUNCATE of a constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR. Reviewed By: RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D103246	2021-06-04 09:53:15 +01:00
Tim Northover	b16ddd0375	AArch64: support atomic zext/sextloads	2021-06-04 09:45:51 +01:00
Esme-Yi	fbfd717197	[Debug-Info] handle DW_CC_pass_by_value/DW_CC_pass_by_reference under strict DWARF. Summary: When -strict-dwarf=true is specified, the calling convention info DW_CC_pass_by_value or DW_CC_pass_by_reference can only be generated at DWARF5. Reviewed By: shchenz, dblaikie Differential Revision: https://reviews.llvm.org/D103300	2021-06-04 08:14:47 +00:00
madhur13490	6a3beb1f68	[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees Don't propagate launch bound related attributes to address taken functions and their callees. The idea is to do a traversal over the call graph starting at address taken functions and erase the attributes set by previous logic i.e. process(). This two phase approach makes sure that we don't miss out on deep nested callees from address taken functions as a function might be called directly as well as indirectly. This patch is also reattempt to D94585 as latent issues are fixed in hasAddressTaken function in the recent past. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103138	2021-06-04 11:36:56 +05:30
hsmahesha	753437fc1d	Revert "[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering." This reverts commit `d71ff907ef`.	2021-06-04 11:16:46 +05:30
Cyndy Ishida	5337c7550d	Revert "[llvm] llvm-tapi-diff" This reverts commit `d1d36f7ad2`. Reverting this patch to investigate linux bot failures + fix with author offline	2021-06-03 21:10:51 -07:00
hsmahesha	d71ff907ef	[AMDGPU] Increase alignment of LDS globals if necessary before LDS lowering. Before packing LDS globals into a sorted structure, make sure that their alignment is properly updated based on their size. This will make sure that the members of sorted structure are properly aligned, and hence it will further reduce the probability of unaligned LDS access. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103261	2021-06-04 09:34:37 +05:30
Nico Weber	5c600dc6d4	Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always)." This reverts commit `41b3088c3f`. Doesn't build on macOS, see comments on https://reviews.llvm.org/D103304	2021-06-03 21:01:11 -04:00
Craig Topper	e9313fa33a	[RISCV] Simplify some code in RISCVInsertVSETVLI by calling an existing function that does the same thing. NFCI	2021-06-03 17:31:54 -07:00
Arthur Eubanks	9255a5c1ba	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. Issues can be diagnosed with D103412. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-06-03 15:52:01 -07:00
Arthur Eubanks	edf2056ff3	[BuildLibCalls] Properly set ABI attributes on arguments Some floating point lib calls have ABI attributes that need to be set on the caller. Found via D103412. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103415	2021-06-03 15:45:07 -07:00
Philip Reames	a4b924a017	Kill a variable which is unused after `cddcc4cf` [nfc]	2021-06-03 14:38:57 -07:00
Philip Reames	cddcc4cff5	A couple style tweaks on top of `5c0d1b2f9` [nfc]	2021-06-03 14:14:59 -07:00
Philip Reames	5c0d1b2f90	[LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed. The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile. Differential Revision: https://reviews.llvm.org/D103620	2021-06-03 14:09:16 -07:00
Julien Pagès	37821155c9	[AMDGPU] Fix a crash when selecting a particular case of buffer_load_format_d16 In this particular example, we had a crash when compiling it for several architectures. This patch extends the legalization of extract_subvector to avoid this problem. Differential Revision: https://reviews.llvm.org/D103344	2021-06-03 16:40:18 -04:00
Jinsong Ji	cd9e1a020c	[Constants][PowerPC] Check exactlyValue for ppc_fp128 in isNullValue PPC_FP128 determines isZero/isNan/isInf using high-order double value only. Checking isZero/isNegative might return the isNullValue unexpectedly. eg: 0xM0000000000000000FFFFFFFFFFFFFFFFF isZero, but it is not NullValue. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D103634	2021-06-03 20:31:01 +00:00
Fangrui Song	a14fc749aa	[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat `__profd_` variables are referenced by code only when value profiling is enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just waste space on ELF/Mach-O. We change the comdat symbol from `__profd_` to `__profc_` because an internal symbol does not provide deduplication features on COFF. The choice doesn't matter on ELF. (In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_` symbols.) On Windows this enables further optimization. We are no longer affected by the link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can cause duplicate definition error. https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585). This avoids many `/INCLUDE:` directives in `.drectve`. Here is rnk's measurement for Chrome: ``` This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%: #BEFORE $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 1047758867 $ du -cksh base_unittests.exe 82M base_unittests.exe 82M total # AFTER $ find . -iname '.obj' \| xargs du -b \| awk '{ sum += $1 } END { print sum}' 937886499 $ du -cksh base_unittests.exe 78M base_unittests.exe 78M total ``` Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103372	2021-06-03 13:16:13 -07:00
Kevin Athey	41b3088c3f	Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never\|[runtime]\|always). In addition: - optionally add global flag to capture compile intent for UAR: __asan_detect_use_after_return_always. The global is a SANITIZER_WEAK_ATTRIBUTE. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D103304	2021-06-03 13:13:51 -07:00
Brendon Cahoon	53ab2d821e	[GlobalISel] Add G_SBFX/G_UBFX to computeKnownBits Differential Revision: https://reviews.llvm.org/D102969	2021-06-03 16:01:47 -04:00
Sam Powell	d1d36f7ad2	[llvm] llvm-tapi-diff This patch introduces a new tool, llvm-tapi-diff, that compares and returns the diff of two TBD files. Reviewed By: ributzka, JDevlieghere Differential Revision: https://reviews.llvm.org/D101835	2021-06-03 11:38:00 -07:00
Eli Friedman	44cdf771fe	[AtomicExpand] Merge cmpxchg success and failure ordering when appropriate. If we're not emitting separate fences for the success/failure cases, we need to pass the merged ordering to the target so it can emit the correct instructions. For the PowerPC testcase, we end up with extra fences, but that seems like an improvement over missing fences. If someone wants to improve that, the PowerPC backed could be taught to emit the fences after isel, instead of depending on fences emitted by AtomicExpand. Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 . Differential Revision: https://reviews.llvm.org/D103342	2021-06-03 11:34:35 -07:00
Artur Pilipenko	a06e63fa52	NFC. Refactor DOTGraphTraits::isNodeHidden Restructure handling of cfg-hide-unreachable-paths and cfg-hide-deoptimize-paths options so as to make it easier to introduce new types of hidden blocks.	2021-06-03 11:27:06 -07:00
Adrian Prantl	a8099b4778	Remove redundant Begin/End form signpost format strings. The os_signpost API already captures the begin/end part and in Instruments, this just adds visual noise that gets in the way of the interesting data. By removing the redundant end text, the display in Instruments gets even less cluttered. rdar://78636200 Differential Revision: https://reviews.llvm.org/D103577	2021-06-03 11:24:13 -07:00
Sanjay Patel	b865eead76	[InstCombine] eliminate sext and/or trunc if value has enough signbits If we have enough signbits in a source value, we can skip an intermediate cast for a trunc+sext pair: https://alive2.llvm.org/ce/z/A_mQt- This is the original problem shown in: https://llvm.org/PR49543 There's a test that shows we transformed what used to be a pair of shifts, so that suggests we could add another ComputeNumSignBits fold starting from a shift. There does not appear to be any change in compile-time from the extra analysis: https://llvm-compile-time-tracker.com/compare.php?from=3d2c9069dcafd0cbb641841aa3dd6e851fb7d760&to=b9513cdf2419704c7bb0c3a02a9ca06aae13d902&stat=instructions Differential Revision: https://reviews.llvm.org/D103617	2021-06-03 13:58:19 -04:00
Philip Reames	44d70d298a	[LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc] This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly. Differential Revision: https://reviews.llvm.org/D103584	2021-06-03 10:33:14 -07:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Philip Reames	bb5e1c6dcb	[LoopUnroll] Reorder code to max dom tree update more obvious [nfc] This cleans up the unroll action into two phases. Phase 1 does the mechanical act of unrolling, and leaves all conditional branches in place. Phase 2 optimizes away some of the conditional branches and then simplifies the loop. The primary benefit of the reordering is that we can delete some special cases dom tree update logic. Differential Revision: https://reviews.llvm.org/D103561	2021-06-03 10:19:56 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Nikita Popov	983565a6fe	[ADT] Move DenseMapInfo for ArrayRef/StringRef into respective headers (NFC) This is a followup to D103422. The DenseMapInfo implementations for ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h headers, which means that these two headers no longer need to be included by DenseMapInfo.h. This required adding a few additional includes, as many files were relying on various things pulled in by ArrayRef.h. Differential Revision: https://reviews.llvm.org/D103491	2021-06-03 18:34:36 +02:00
Jeremy Morse	0507fc2ffc	Revert "[DebugInstrRef][2/3] Track PHI values through register coalescing" This reverts commit `ae4303b42c`. Expensive checks buildbot has found a problem with this: https://lab.llvm.org/buildbot/#/builders/16/builds/11863	2021-06-03 17:16:58 +01:00
Jeremy Morse	ae4303b42c	[DebugInstrRef][2/3] Track PHI values through register coalescing In the instruction referencing variable location model, we store variable locations that point at PHIs in MachineFunction during register allocation. Unfortunately, register coalescing can substantially change the locations of registers, and so that PHI-variable-location side table needs maintenence during the pass. This patch builds an index from the side table, and whenever a vreg gets coalesced into another vreg, update the index to record the new vreg that the PHI happens in. It also accepts a limited range of subregister coalescing, for example merging a subregister into a larger class. Differential Revision: https://reviews.llvm.org/D86813	2021-06-03 17:06:51 +01:00
Hamza Mahfooz	83235b07e3	[Matrix] Preserve existing fast-math flags during lowering This patch makes it so, floating-point instructions created in LowerMatrixIntrinsics retain fast-math flags from instructions that are higher up the chain. Fixes https://bugs.llvm.org/show_bug.cgi?id=49738 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D103233	2021-06-03 15:29:31 +01:00
David Green	929c54379a	[ARM] Prettify gather/scatter debug comments. NFC	2021-06-03 12:33:03 +01:00
Fraser Cormack	8790e85255	[RISCV] Reserve an emergency spill slot for any RVV spills This patch addresses an issue in which fixed-length (VLS) vector RVV code could fail to reserve an emergency spill slot for their frame index elimination. This is because we were previously only reserving a spill slot when there were `scalable-vector` frame indices being used. However, fixed-length codegen uses regular-type frame indices if it needs to spill. This patch does the fairly brute-force method of checking ahead of time whether the function contains any RVV spill instructions, in which case it reserves one slot. Note that the second RVV slot is still only reserved for `scalable-vector` frame indices. This unfortunately causes quite a bit of churn in existing tests, where we chop and change stack offsets for spill slots. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103269	2021-06-03 10:44:34 +01:00
Fraser Cormack	1de1887f5f	[CodeGen] Fix a scalable-vector crash in VSELECT legalization The `DAGTypeLegalizer::WidenVSELECTMask` function is not (yet) ready for scalable vector types, and has numerous places in which it tries to grab either the fixed size or number of elements of its types. I believe that it should be possible to update this method to properly account for scalable-vector types, but we don't have test cases for that; RISC-V bails out early on as it has legal i1 vector masks. As such, this patch just prevents it from crashing. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103536	2021-06-03 10:24:55 +01:00
Fraser Cormack	2dd20a31f2	[ValueTypes] Fix scalable-vector changeExtendedVectorTypeToInteger The attached tests check for the regression in DAGCombiner's `visitVSELECT`, which may call this method. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103534	2021-06-03 09:36:56 +01:00
Arthur Eubanks	1faff79b7c	[DFSan] Properly set argument ABI attributes Calls must properly match argument ABI attributes with the callee. Found via D103412. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D103414	2021-06-02 22:24:46 -07:00
Amy Huang	9d070b2f48	Recommit "Fix tmp files being left on Windows builds." with a fix for incorrect std::string use. (Also remove redundant call to RemoveFileOnSignal.) Clang writes object files by first writing to a .tmp file and then renaming to the final .obj name. On Windows, if a compile is killed partway through the .tmp files don't get deleted. Currently it seems like RemoveFileOnSignal takes care of deleting the tmp files on Linux, but on Windows we need to call setDeleteDisposition on tmp files so that they are deleted when closed. This patch switches to using TempFile to create the .tmp files we write when creating object files, since it uses setDeleteDisposition on Windows. This change applies to both Linux and Windows for consistency. Differential Revision: https://reviews.llvm.org/D102876 This reverts commit `20797b129f`.	2021-06-02 16:50:37 -07:00
Fangrui Song	87c43f3aa9	[InstrProfiling] Delete linkage/visibility toggling for Windows The linkage/visibility of `__profn_` variables are derived from the profiled functions. extern_weak => linkonce available_externally => linkonce_odr internal => private extern => private _ => unchanged The linkage/visibility of `__profc_`/`__profd_` variables are derived from `__profn_` with linkage/visibility wrestling for Windows. The changes can be folded to the following without changing semantics. ``` if (TT.isOSBinFormatCOFF() && !NeedComdat) { Linkage = GlobalValue::InternalLinkage; Visibility = GlobalValue::DefaultVisibility; } ``` That said, I think we can just delete the code block. An extern/internal function will now use private `__profc_`/`__profd_` variables, instead of internal ones. This saves some symbol table entries. A non-comdat {linkonce,weak}_odr function will now use hidden external `__profc_`/`__profd_` variables instead of internal ones. There is potential object file size increase because such symbols need `/INCLUDE:` directives. However such non-comdat functions are rare (note that non-comdat weak definitions don't prevent duplicate definition error). The behavior changes match ELF. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D103355	2021-06-02 16:49:54 -07:00
Fangrui Song	aba67ba784	[MC] Delete unneeded MCAsmParser &Parser	2021-06-02 16:10:18 -07:00
Fangrui Song	c980d93d91	[MC] Change "unexpected tokens" to "expected newline" and remove unneeded "in .xxx directive"	2021-06-02 16:08:05 -07:00
Dave Lee	60ce8babf7	[coro] Preserve scope line for compiler generated functions Coro-split functions with an active suspend point have their scope line set to the line of the suspend point. However for compiler generated functions, this results in debug info with unconventional results: a file named `<compiler-generated>` with a non-zero line number. The convention for `<compiler-generated>` is that the line number is zero. This change propagates the scope line only for non-compiler generated functions. Differential Revision: https://reviews.llvm.org/D102412	2021-06-02 15:57:12 -07:00
Anshil Gandhi	1c5ff0b03f	[PowerPC] [GlobalISel] Implementation of formal arguments lowering in the IRTranslator for the PPC backend Differential Revision: https://reviews.llvm.org/D99812	2021-06-02 16:46:39 -06:00
Anshil Gandhi	3e5ddb83e3	Revert "Differential Revision: https://reviews.llvm.org/D99812 " This reverts commit `c729f2a48a`.	2021-06-02 16:36:00 -06:00
Simon Pilgrim	9f5d783d46	[X86][SSE] combineScalarToVector - only reuse broadcasts for scalar_to_vector if the source operands scalar types match We were hitting an issue when the scalar_to_vector source was being implicitly truncated (in this case to i8 to vXi1) but we were also using the i8 source in a broadcast to a vXi8 value. Fixes PR50374	2021-06-02 22:05:40 +01:00
Min-Yih Hsu	344e919b1a	[CodeGen][NFC] Remove unused virtual function `TargetFrameLowering::emitCalleeSavedFrameMoves` with 4 arguments is not used anywhere in CodeGen. Thus it shouldn't be exposed as a virtual function. NFC. Differential Revision: https://reviews.llvm.org/D103328	2021-06-02 13:11:12 -07:00
Anshil Gandhi	c729f2a48a	Differential Revision: https://reviews.llvm.org/D99812	2021-06-02 14:09:52 -06:00
Sanjay Patel	0718ac706d	[SDAG] allow cast folding for vector sext-of-setcc with signed compare This extends `434c8e013a` and `ede3982792` to handle signed predicates by sign-extending the setcc operands. This is not shown directly in https://llvm.org/PR50055 , but the pattern is visible by changing the unsigned convert to signed in the source code.	2021-06-02 15:05:02 -04:00
Andrew Browne	70804f2a2f	Fix dfsan handling of musttail calls. Without this change, a callsite like: [[clang::musttail]] return func_call(x); will cause an error like: fatal error: error in backend: failed to perform tail call elimination on a call site marked musttail due to DFSan inserting instrumentation between the musttail call and the return. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D103542	2021-06-02 11:38:35 -07:00
Rong Xu	6745ffe4fa	[SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part) This patch was split from https://reviews.llvm.org/D102246 [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This is mainly for ProfileData part of change. It will load FS Profile when such profile is detected. For an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. For other format profiles, the users need to use an internal option (-profile-isfs) to tell the compiler that the profile uses FS discriminators. This patch also simplified the bit API used by FS discriminators. Differential Revision: https://reviews.llvm.org/D103041	2021-06-02 10:32:52 -07:00
Sanjay Patel	ede3982792	[SDAG] allow more cast folding for vector sext-of-setcc This is a follow-up to D103280 that eases the use restrictions, so we can handle the motivating case from: https://llvm.org/PR50055 The loop code is adapted from similar use checks in ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an easier way to filter out non-chain uses of load values. Differential Revision: https://reviews.llvm.org/D103462	2021-06-02 13:14:49 -04:00
Adrian Prantl	fcfaed4ae6	Remove redundant comparisons (NFC)	2021-06-02 09:52:45 -07:00
Stephen Tozer	4316b0e59c	[LoopStrengthReduce] Ensure that debug intrinsics do not affect LSR's output During Loop Strength Reduce, if the terminating condition for the loop is not immediately adjacent to the terminating branch and it has more than one use, a clone of the condition will be created just before the terminating branch and will be used as the branch condition. Currently, whether the instructions are "immediately adjacent" is determined by checking whether the next instruction after the condition is the terminating branch; this is incorrect however, as the presence of a debug intrinsic between the two will result in a change to the output. This is fixed by using getNextNonDebugInstruction() instead. Differential Revision: https://reviews.llvm.org/D103033	2021-06-02 15:56:23 +01:00
Arnold Schwaighofer	f1a0c5d67c	[coro async] Add the swiftasync attribute to the resume partial function Transfer the swiftasync attribute to the resume partial function according to suspend.async specification. It's first argument denotes which argument is the async context. rdar://71499498 Differential Revision: https://reviews.llvm.org/D103285	2021-06-02 07:44:33 -07:00
Qunyan Mangus	cbde248736	Add getDemandedBits for uses. Add getDemandedBits method for uses so we can query demanded bits for each use. This can help getting better use information. For example, for the code below define i32 @test_use(i32 %a) { %1 = and i32 %a, -256 %2 = or i32 %1, 1 %3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose) ... some use of %3 ret %2 } if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97074	2021-06-02 10:07:40 -04:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Sander de Smalen	034503e9d2	[LV] NFC: Remove redundant isLegalMasked(Gather\|Scatter) functions. This NFC change follows from conversation in D102437, where it was discussed to remove these functions as a separate patch.	2021-06-02 14:09:07 +01:00
Sander de Smalen	3472d3fd9d	[LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType. llvm::getLoadStoreType was added recently and has the same implementation as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no value in having two implementations, this patch removes the custom LV implementation in favor of the generic one defined in Instructions.h.	2021-06-02 14:09:06 +01:00
Daniil Fukalov	0195e594fe	[TTI] NFC: Change getIntImmCodeSizeCost to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102915	2021-06-02 16:04:11 +03:00
Irina Dobrescu	e971099a9b	[AArch64] Optimise bitreverse lowering in ISel Differential Revision: https://reviews.llvm.org/D103105	2021-06-02 12:51:12 +01:00
Jingu Kang	f3a27511c9	[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch This re-enables commit `107d19eb01` with bug fixes. Differential Revision: https://reviews.llvm.org/D99354	2021-06-02 10:58:22 +01:00
Bjorn Pettersson	536e02a23c	[CodeGen] Refactor libcall lookups for RTLIB::POWI_* Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_* libcall for a given value type. This includes a small refactoring of ExpandFPLibCall and ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code, plus adding an ExpandFPLibCall version that can be called directly when expanding FPOWI/STRICT_FPOWI to ensure that we actually use the same RTLIB::Libcall when expanding the libcall as we used when checking the legality of such a call by doing a getLibcallName check. Differential Revision: https://reviews.llvm.org/D103050	2021-06-02 11:40:34 +02:00
Bjorn Pettersson	d1273d39d3	[LegalizeTypes] Avoid promotion of exponent in FPOWI The FPOWI DAG node is normally lowered to a libcall to one of the RTLIB::POWI* runtime functions and the exponent should normally have a type matching sizeof(int) when making the call. Thus, type promotion of the exponent could lead to an FPOWI with a type for the second operand that would be incorrect when doing the libcall (a situation which would be hard to detect post-legalization if we allow such FPOWI nodes). This patch is changing DAGTypeLegalizer::PromoteIntOp_FPOWI to do the rewrite into a libcall directly instead of promoting the operand. This way we can check that the exponent is smaller than sizeof(int) and we can let TargetLowering handle promotion as part of making the libcall. It could be noticed here that makeLibCall has some knowledge about targets such as 64-bit RISCV, for which the libcall argument should be extended to a type larger than sizeof(int). Differential Revision: https://reviews.llvm.org/D102950	2021-06-02 11:40:34 +02:00
Bjorn Pettersson	9c54ee4378	[SimplifyLibCalls] Take size of int into consideration when emitting ldexp/ldexpf When rewriting powf(2.0, itofp(x)) -> ldexpf(1.0, x) exp2(sitofp(x)) -> ldexp(1.0, sext(x)) exp2(uitofp(x)) -> ldexp(1.0, zext(x)) the wrong type was used for the second argument in the ldexp/ldexpf libc call, for target architectures with 16 bit "int" type. The transform incorrectly used a bitcasted function pointer with a 32-bit argument when emitting the ldexp/ldexpf call for such targets. The fault is solved by using the correct function prototype in the call, by asking TargetLibraryInfo about the size of "int". TargetLibraryInfo by default derives the size of the int type by assuming that it is 16 bits for 16-bit architectures, and 32 bits otherwise. If this isn't true for a target it should be possible to override that default in the TargetLibraryInfo initializer. Differential Revision: https://reviews.llvm.org/D99438	2021-06-02 11:40:34 +02:00
Tomasz Miąsko	a67a234ec7	[Demangle][Rust] Parse binders Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102729	2021-06-02 10:36:45 +02:00
Fraser Cormack	3b0a33d0ad	[RISCV] Expand unaligned fixed-length vector memory accesses RVV vectors must be aligned to their element types, so anything less is unaligned. For regular loads and stores, our custom-lowering of fixed-length vectors meant that we opted out of LegalizeDAG's built-in unaligned expansion. This patch adds that logic in to our custom lower function. For masked intrinsics, we declare that anything unaligned is not legal, leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us. Note that neither of these methods can handle the expansion of scalable-vector memory ops, so those cases are left alone by this patch. Scalable loads and stores already go through expansion by default but hit an assertion, and scalable masked intrinsics will silently generate incorrect code. It may be prudent to return an error in both of these cases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102493	2021-06-02 09:27:44 +01:00
Daniil Fukalov	0b34acdab7	[NFC] Fix 'Load' name masking. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103456	2021-06-02 11:09:53 +03:00
Sriraman Tallam	516e5bb2b1	Resubmit D85085 after fixing the tests that were failing. D85085 was pushed earlier but broke tests on mac and win: http://lab.llvm.org:8080/green/job/clang-stage1-RA/21182/consoleFull#-706149783d489585b-5106-414a-ac11-3ff90657619c Recommitting it after adding mtriple to the llc commands. Emit correct location lists with basic block sections. This patch addresses multiple things: 1) It ensures that const_value is emitted when possible with basic block sections. 2) It emits location lists such that the labels are always within the section boundary. 3) It fixes a bug when the parameter is first used in a non-entry block which is in a different section from the entry block. Differential Revision: https://reviews.llvm.org/D85085	2021-06-01 21:59:47 -07:00
Amy Huang	20797b129f	Revert "Fix tmp files being left on Windows builds." for now; causing some asan test failures. This reverts commit `7daa182159`.	2021-06-01 19:51:47 -07:00
Craig Topper	41ff1e0e29	[RISCV] Improve register allocation for masked vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv. The first source has the same EEW as the destination, but we're using earlyclobber which prevents them from ever being the same register. To workaround this, add a special TIED pseudo to use whenever the first source and merge operand are the same value. This allows us to use a single operand for the merge operand and first source which we can then tie to the destination. A tied source disables earlyclobber for that operand. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103211	2021-06-01 18:59:00 -07:00
Amy Huang	7daa182159	Fix tmp files being left on Windows builds. Clang writes object files by first writing to a .tmp file and then renaming to the final .obj name. On Windows, if a compile is killed partway through the .tmp files don't get deleted. Currently it seems like RemoveFileOnSignal takes care of deleting the tmp files on Linux, but on Windows we need to call setDeleteDisposition on tmp files so that they are deleted when closed. This patch switches to using TempFile to create the .tmp files we write when creating object files, since it uses setDeleteDisposition on Windows. This change applies to both Linux and Windows for consistency. Differential Revision: https://reviews.llvm.org/D102876	2021-06-01 17:09:08 -07:00
Stanislav Mekhanoshin	9e2e49328f	[AMDGPU] All GWS instructions need aligned VGPR on gfx90a Fixes: SWDEV-288006 Differential Revision: https://reviews.llvm.org/D103197	2021-06-01 17:08:03 -07:00
Arthur Eubanks	8961293851	[OpaquePtr] Create API to make a copy of a PointerType with some address space Some existing places use getPointerElementType() to create a copy of a pointer type with some new address space. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D103429	2021-06-01 16:52:32 -07:00
Arthur Eubanks	26044c6a54	[InstSimplify] Treat invariant group insts as bitcasts for load operands We can look through invariant group intrinsics for the purposes of simplifying the result of a load. Since intrinsics can't be constants, but we also don't want to completely rewrite load constant folding, we convert the load operand to a constant. For GEPs and bitcasts we just treat them as constants. For invariant group intrinsics, we treat them as a bitcast. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D101103	2021-06-01 16:33:06 -07:00
Michael Benfield	00d19c6704	[various] Remove or use variables which are unused but set. This is in preparation for the -Wunused-but-set-variable warning. Differential Revision: https://reviews.llvm.org/D102942	2021-06-01 15:38:48 -07:00
Daniel Sanders	9372662050	fixup: Missing operator in [globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one My local compiler was fine with it but the bots complain about ambiguous types.	2021-06-01 13:58:03 -07:00
Daniel Sanders	aaac268285	[globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one It's still in use in a few places so we can't delete it yet but there's not many at this point. Differential Revision: https://reviews.llvm.org/D103352	2021-06-01 13:23:48 -07:00
Arthur Eubanks	2983053d23	[NFC][OpaquePtr] Explicitly pass GEP source type to IRBuilder in more places	2021-06-01 13:13:37 -07:00
Anirudh Prasad	e52007cac4	[SystemZ][z/OS] Stricter condition for HLASM class instantiation - A lot of lit tests simply specify the arch minus the triple. On z/OS, this could result in a scenario of some-other-triple-unknown-ibm-zos. This points to an incorrect triple + arch combo. - To prevent this, isOSzOS change is switched in favour of isOSBinFormatGOFF. - This is because, the GOFF format is set only if the triple is systemz and if the operating system is GOFF. And currently, there are no other architectures/os's using the GOFF file format. - An argument could be made that the problematic tests be fixed to explicitly specify the arch-vendor-triple string, but there's a large number of these tests, and adding this stricter scope ensures that we aren't instantiating the incorrect instance of the AsmParser for other platforms when run on z/OS. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D103343	2021-06-01 15:56:50 -04:00
madhur13490	3c874ce427	[AMDGPU][NFC] Remove author's name from codebase This must have made to code by accident. Differential Revision: https://reviews.llvm.org/D103484	2021-06-02 00:51:48 +05:30
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Jessica Paquette	e7f501b5e7	[GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width Also add a target hook which allows us to get around custom legalization on AArch64. Differential Revision: https://reviews.llvm.org/D99283	2021-06-01 10:56:17 -07:00
Guozhi Wei	1b748faf2b	[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D101970	2021-06-01 10:31:30 -07:00
Jonas Paulsson	9ee3f16919	[SystemZ] Return true from hasBitPreservingFPLogic(). This is currently NFC on benchmarks and tests. Review: Ulrich Weigand	2021-06-01 11:52:50 -05:00
Eli Friedman	fd229caa01	[polly] Fix SCEVLoopAddRecRewriter to avoid invalid AddRecs. When we're remapping an AddRec, the AddRec constructed by a partial rewrite might not make sense. This triggers an assertion complaining it's not loop-invariant. Instead of constructing the partially rewritten AddRec, just skip straight to calling evaluateAtIteration. Testcase was automatically reduced using llvm-reduce, so it's a little messy, but hopefully makes sense. Differential Revision: https://reviews.llvm.org/D102959	2021-06-01 09:51:05 -07:00
Nikita Popov	fd7e309e02	[ADT] Move DenseMapInfo for APInt into APInt.h (PR50527) As suggested in https://bugs.llvm.org/show_bug.cgi?id=50527, this moves the DenseMapInfo for APInt and APSInt into the respective headers, removing the need to include APInt.h and APSInt.h from DenseMapInfo.h. We could probably do the same from StringRef and ArrayRef as well. Differential Revision: https://reviews.llvm.org/D103422	2021-06-01 18:31:41 +02:00
Craig Topper	896f9bc350	[RISCV] Remove earlyclobber from vnsrl/vnsra/vnclip(u) when the source and dest are a single vector register. This guarantees they meet this overlap exception: "The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group" Being a single register guarantees the overlap is always in the lowerst-number part of the group. Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D103351	2021-06-01 09:17:52 -07:00
Craig Topper	5a5219a0f9	[RISCV] Remove earlyclobber from compares with LMUL<=1. Compares are considered a narrowing operation for register overlap. I believe for LMUL<=1 they meet this exception to allow overlap "The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group" Both the result and the sources will occupy a single register for LMUL<=1 so the overlap would always be in the "lowest-numbered part". Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103336	2021-06-01 09:08:11 -07:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Daniil Seredkin	13140120dc	[InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y) InstCombine didn't perform the transformations when fmul's operands were the same instruction because it required to have one use for each of them which is false in the case. This patch fixes this + adds tests for them and introduces a new function isOnlyUserOfAnyOperand to check these cases in a single place. This patch is a result of discussion in D102574. Differential Revision: https://reviews.llvm.org/D102698	2021-06-01 08:33:23 -04:00
Florian Hahn	1b84acb23a	[LoopDeletion] Consider infinite loops alive, unless mustprogress. The current loop or any of its sub-loops may be infinite. Unless the function or the loops are marked as mustprogress, this in itself makes the loop not dead. This patch moves the logic to check whether the current loop is finite or mustprogress to `isLoopDead` and also extends it to check the sub-loops. This should fix PR50511. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D103382	2021-06-01 13:07:36 +01:00
Sanjay Patel	1b14f3951a	[SDAG] add helper function for sext-of-setcc folds; NFC Try to make this easier to read as noted in D103280	2021-06-01 08:07:17 -04:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at `575e2aff55`. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Fraser Cormack	4f500c402b	[RISCV] Support vector types in combination with fastcc This patch extends the RISC-V lowering of the 'fastcc' calling convention to vector types, both fixed-length and scalable. Without this patch, any function passing or returning vector types by value would throw a compiler error. Vectors are handled in 'fastcc' much as they are in the default calling convention, the noticeable difference being the extended set of scalar GPR registers that can be used to pass vectors indirectly. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102505	2021-06-01 10:31:18 +01:00
Andy Wingo	82f92e35c6	[WebAssembly][CodeGen] IR support for WebAssembly local variables This patch adds TargetStackID::WasmLocal. This stack holds locations of values that are only addressable by name -- not via a pointer to memory. For the WebAssembly target, these objects are lowered to WebAssembly local variables, which are managed by the WebAssembly run-time and are not addressable by linear memory. For the WebAssembly target IR indicates that an AllocaInst should be put on TargetStackID::WasmLocal by putting it in the non-integral address space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift these allocations to SSA locals, but any alloca that reaches instruction selection (usually in non-optimized builds) will be assigned the new TargetStackID there. Loads and stores to those values are transformed to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes, which then lower to the type-specific LOCAL_GET_I32 etc instructions via tablegen patterns. Differential Revision: https://reviews.llvm.org/D101140	2021-06-01 11:31:39 +02:00
Roman Lebedev	a3b8695bf5	[X86] AMD Zen 3 has fast variable per-lane shuffles ... but lane-crossing shuffles are slow.	2021-06-01 10:46:05 +03:00
Roman Lebedev	cf9b1f7a0e	[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature, which controls profitability of both the cross-lane and per-lane variable shuffles. I guess, this has been fine so far. But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`) are as fast as as shuffles with fixed/immediate mask, while lane-crossing shuffles, e.g. `VPERMPS` is performing worse. So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles, as suggested by @RKSimon, split the feature flag into two. Differential Revision: https://reviews.llvm.org/D103274	2021-06-01 10:39:36 +03:00

1 2 3 4 5 ...

147671 Commits