llvm-project

Commit Graph

Author	SHA1	Message	Date
Hongtao Yu	ad2a59f584	[CSSPGO] Introducing dangling pseudo probes. Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block. To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission. If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95962	2021-03-03 22:44:41 -08:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	e592dad82e	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c8c93fdf0a	[Attributor] Avoid work for GEPs and wait till the users are visited	2021-03-04 00:35:52 -06:00
Johannes Doerfert	f3f88287c5	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c14213e030	[Attributor][NFC] Move some trivial checks up	2021-03-04 00:35:52 -06:00
Johannes Doerfert	09c3eebf5f	[Attributor] Use sensible initialization in AANoCaptureCallSiteReturned	2021-03-04 00:35:51 -06:00
Evgeniy Brevnov	e94125f054	[DSE] Add support for not aligned begin/end This is an attempt to improve handling of partial overlaps in case of unaligned begin\end. Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end. The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment. I'll update with performance results as measured by the test-suite...it's still running... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93530	2021-03-04 12:24:23 +07:00
Serguei Katkov	a0ff0f30df	[InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase statepoint intrinsic can be used in invoke context, so it should be handled in visitCallBase to cover both call and invoke. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97833	2021-03-04 11:00:22 +07:00
Fangrui Song	584cb67d2d	[IRSymTab] Set FB_used on llvm.compiler.used symbols IR symbol table does not parse inline asm. A symbol only referenced by inline asm is not in the IR symbol table, so LTO does not know that the definition (in another translation unit) is referenced and may internalize it, even if that definition has `__attribute__((used))` (which lowers to `llvm.compiler.used` on ELF targets since D97446). ``` // cabac.c __attribute__((used)) const uint8_t ff_h264_cabac_tables[...] = {...}; // h264_cabac.c asm("lea ff_h264_cabac_tables(%rip), %0" : ...); ``` `__attribute__((used))` is the recommended way to tell the compiler there may be inline asm references, so the usage is perfectly fine. This patch conservatively sets the `FB_used` bit on `llvm.compiler.used` symbols to work around the IR symbol table limitation. Note: before D97446, Clang never emitted symbols in the `llvm.compiler.used` list, so this change does not punish any Clang emitted global object. Without the patch, `ff_h264_cabac_tables` may be assigned to a non-external partition and get internalized. Then we will get a linker error because the `cabac.c` definition is not exposed. Differential Revision: https://reviews.llvm.org/D97755	2021-03-03 16:22:30 -08:00
Xun Li	03f668613c	[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment. This patch disable promotion by check whether loop contains coro.suspend intrinsics. This is a resubmit of D86190. Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose. In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame. Differential Revision: https://reviews.llvm.org/D96928	2021-03-03 15:21:57 -08:00
Jin Lin	7c2192b277	Add the use of register r for outlined function when register r is live in and defined later. The compiler needs to mark register $x0 as live in for the following case. $x1 = ADDXri $sp, 16, 0 BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0 Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95267	2021-03-03 15:14:11 -08:00
Sanjay Patel	b3f0c2653b	[Analysis] simplify propagation of FMF in recurrences; NFC This is a mess, but this is hopefully no-functional-change. The 'Prev' descriptor is only used for min/max recurrences or when starting a match from a phi, so it should not be a factor when propagating FMF for fmul/fadd. The API is confusing (and should be reduced in subsequent steps) because the "UnsafeAlgebraInst" appears to actually be a placeholder for a recurrence that does NOT have FMF, but we still want to treat it as reassociative.	2021-03-03 17:28:10 -05:00
Florian Hahn	75805dce5f	[AArch64] Add implicit uses for operands when expanding BLR_RVMARKER. Make sure we preserve info about passed arguments as implicit uses, to make sure later passes still have access to this information. This fixes a mis-compile where the machine-combiner would pick an incorrect free register.	2021-03-03 21:56:05 +00:00
Whitney Tsang	58d531fd6f	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-03 20:43:31 +00:00
Philip Reames	89d331a31e	Address review comment from D97219 (follow up to `8051156`) Probably should have done this before landing, but I forgot. Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything. Also happens to set us up for handling non-add recurrences in the future if desired.	2021-03-03 12:20:27 -08:00
Philip Reames	99f5417346	Sink routine for replacing a operand bundle to CallBase [NFC] We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.	2021-03-03 12:07:55 -08:00
Philip Reames	805115655e	[LSR] Unify scheduling of existing and inserted addrecs LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold. The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment. As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow. Differential Revision: https://reviews.llvm.org/D97219	2021-03-03 12:07:55 -08:00
Jonas Paulsson	7334b3dc3e	[SystemZ] Reimplement the i8/i16 compare-and-swap logic. Even though the implementation in emitAtomicCmpSwapW() was correct, it made Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can be made on the OldVal, and the problem is avoided. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D97604	2021-03-03 14:04:32 -06:00
Florian Hahn	8c3a70a78f	[AArch64] Move CALL_RVMARKER definition after CALL. This is a NFC with respect to the generated code. But it fixes a crash when using -debug, because of the position in the enum CALL_RVMARKER nodes were treated as memops. That caused a crash when printing CALL_RVMARKER nodes.	2021-03-03 19:42:16 +00:00
Fangrui Song	a84f4fc0df	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker does not let `__start_/__stop_` references retain their sections. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on COFF/Mach-O are not affected. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D97649	2021-03-03 11:32:24 -08:00
Stanislav Mekhanoshin	b70c483e04	[AMDGPU] Exclude always_inline from max bb threshold Honor always_inline attribute when processing -amdgpu-inline-max-bb. It was lost during the ports of the heuristic. There is no reason to honor inline hint, but not always inline. Differential Revision: https://reviews.llvm.org/D97790	2021-03-03 10:21:56 -08:00
Philip Reames	c8cf27e333	Fix a build warning from `ea7d208`	2021-03-03 09:16:56 -08:00
Philip Reames	ea7d208b78	[basicaa] Rewrite isGEPBaseAtNegativeOffset in terms of index difference [mostly NFC] This is almost purely NFC, it just fits more obviously in the flow of the code now that we've standardized on the index different approach. The non-NFC bit is that because of canceling the VariableOffsets in the subtract, we can now handle the case where both sides involve a common variable offset. This isn't an "interesting" improvement; it just happens to fall out of the natural code structure. One subtle point - the placement of this above the BaseAlias check is important in the original code as this can return NoAlias even when we can't find a relation between the bases otherwise. Also added some enhancement TODOs noticed while understanding the existing code. Note: This is slightly different than the LGTMed version. I fixed the "inbounds" issue Nikita noticed with the original code in `e6e5ef4` and rebased this to include the same fix. Differential Revision: https://reviews.llvm.org/D97520	2021-03-03 09:03:28 -08:00
Baptiste Saleil	54c0f520c7	[VirtRegRewriter] Insert missing killed flags when tracking subregister liveness VirtRegRewriter may sometimes fail to correctly apply the kill flag where necessary, which causes unecessary code gen on PowerPC. This patch fixes the way masks for defined lanes are computed and the way mask for used lanes is computed. Contact albion.fung@ibm.com instead of author for problems related to this commit. Differential Revision: https://reviews.llvm.org/D92405	2021-03-03 12:02:04 -05:00
Philip Reames	e6e5ef40cb	[basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset This was pointed out in review of D97520 by Nikita, but existed in the original code as well. The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr. The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps. Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds". If that small sub-offset lies within the object, the result was unsound. We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.	2021-03-03 08:43:32 -08:00
Arnold Schwaighofer	a42bea211a	[coro async] Allow a coro.suspend.async to specify which argument is the context argument Before we used the same argument as the entry point. The resume partial function might want to use a different ABI for its context argument Differential Revision: https://reviews.llvm.org/D97333	2021-03-03 08:27:37 -08:00
Simon Pilgrim	aa4afebbf9	[X86] Fold scalar_to_vector(x) -> extract_subvector(broadcast(x),0) iff broadcast(x) exists Add handling for reusing an existing broadcast(x) to a wider vector.	2021-03-03 15:50:37 +00:00
Nico Weber	64f5d7e972	Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF" This reverts commit `04c3040f41`. Breaks instrprof-value-merge.c in bootstrap builds.	2021-03-03 10:21:17 -05:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Ayke van Laethem	15f495c0bc	[AVR] Fix def state of operands Some instructions (especially mov+pop instructions) were setting the wrong operands. For example, the pop instruction had the register set as a source operand while it is a destination operand (the value is loaded into the register). I have found these issues using the machine verifier and using manual code inspection. Differential Revision: https://reviews.llvm.org/D97159	2021-03-03 15:36:05 +01:00
Ayke van Laethem	bbfef8ac95	[AVR] Fix expansion of NEGW The previous expansion used SBCI, which is incorrect because the NEGW pseudo instruction accepts a DREGS operand (2xGPR8) and SBCI only allows LD8 registers. One solution could be to correct the NEGW pseudo instruction, but another solution is to use a different instruction (sbc) that does accept a GPR8 register and therefore allows more freedom to the register allocator. The output now matches avr-gcc for the following code: int foo(int n) { return -n; } I've found this issue using the machine instruction verifier: it was complaining about the wrong register class in NEGWRd.mir. Differential Revision: https://reviews.llvm.org/D97131	2021-03-03 15:36:05 +01:00
Ayke van Laethem	4f6d7985d4	[AVR] Add register aliases XL, YH, etc These aliases are sometimes used in assembly code and make the code more readable. They are supported by avr-gcc too. Differential Revision: https://reviews.llvm.org/D96492	2021-03-03 15:36:05 +01:00
Matt Arsenault	78dcff4841	GlobalISel: Add default implementation of assignValueToReg Refactor insertion of the asserting ops. This enables using them for AMDGPU. This code should essentially be the same for every target. Mips, X86 and ARM all have different code there now, but this seems to be an accident. The assignment functions are called with different types than they would be in the DAG, so this is all likely an assortment of hacks to get around that.	2021-03-03 09:29:53 -05:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Carl Ritson	2ddac69f98	[AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x While the underlying instruction is called image_msaa_load, the resource must be x component only. Rename the intrinsic for clarity. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97829	2021-03-03 17:30:39 +09:00
Thomas Preud'homme	09c3573903	[FileCheck] Do not skip end of line in diagnostics When commit `da108b4ed4` introduced the CHECK-NEXT directive, it added logic to skip to the next line when printing a diagnostic if the current matching position is at the end of a line. This was fine while FileCheck did not support regular expression but since it does now it can be confusing when the pattern to match starts with the expectation of a newline (e.g. CHECK-NEXT: {{\n}}foo). It is also inconsistent with the column information in the diagnostic which does point to the end of line. This commit removes this logic altogether, such that failure to match diagnostic for such cases would show the end of line and be consistent with the column information. The commit also adapts all existing testcases accordingly. Note to reviewers: An alternative approach would be to restrict the code to only skip to the next line if the first character of the pattern is known not to match a whitespace-like character. This would respect the original intent but keep the inconsistency in terms of column info and requires more code. I've only chosen this current approach by laziness and would be happy to restrict the logic instead. Reviewed By: jdenny, jhenderson Differential Revision: https://reviews.llvm.org/D93341	2021-03-03 08:20:39 +00:00
David Green	ab280cbaa3	[ARM] Ensure undef is propagated to CBZ/CBNZ flags In some rare circumstances we can be using an undef register for a compare. When folded into a CBZ/CBNZ the undef flags are lost, leading to machine verifier problems. This propagates the existing flags to the new instruction.	2021-03-03 08:02:58 +00:00
Andy Wingo	4307069df4	[WebAssembly] Swap operand order of call_indirect in text format The WebAssembly text and binary formats have different operand orders for the "type" and "table" fields of call_indirect (and return_call_indirect). In LLVM we use the binary order for the MCInstr, but when we produce or consume the text format we should use the text order. For compilation units targetting WebAssembly 1.0 (without the reference types feature), we omit the table operand entirely. Differential Revision: https://reviews.llvm.org/D97761	2021-03-03 08:51:21 +01:00
Qiu Chaofan	72d4a41ba6	[PowerPC] Allow spilling GPR to VSR on AIX This patch enables spilling GPR to VSRs instead of stack under AIX ABI. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97367	2021-03-03 13:32:39 +08:00
Craig Topper	543b901e58	[LegalizeVectorTypes] Improve SplitVecRes_INSERT_SUBVECTOR to handle subvector being in the high half of the split or not at element 0 of the low half. This function isn't exercised in lit tests today today according to the code coverage report. But will be after the tests in D97543 and D97559. Posting this patch to help a crash that Fraser hit. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97582	2021-03-02 21:14:13 -08:00
Jianzhou Zhao	ac4c1760b2	Fix the build error caused by D97570	2021-03-03 04:47:00 +00:00
Jianzhou Zhao	d866b9c99d	[dfsan] Propagate origin tracking at load This is a part of https://reviews.llvm.org/D95835. One issue is about origin load optimization: see the comments of useCallbackLoadLabelAndOrigin @gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97570	2021-03-03 04:32:30 +00:00
George Balatsouras	6ff18b08e6	[dfsan] Fix clang-tidy warnings This addresses ~50 clang-tidy warnings on dfsan instrumentation pass. It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97714	2021-03-02 17:37:45 -08:00
Victor Huang	1756b2adc9	[AIX][TLS] Generate TLS variables in assembly files This patch allows generating TLS variables in assembly files on AIX. Initialized and external uninitialized variables are generated with the .csect pseudo-op and local uninitialized variables are generated with the .comm/.lcomm pseudo-ops. The patch also adds a check to explicitly say that TLS is not yet supported on AIX. Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile Originally patched by: bsaleil Commandeered by: NeHuang Differential Revision: https://reviews.llvm.org/D96184	2021-03-02 18:22:48 -06:00
Arthur Eubanks	99f1e86cbb	[opt] Error if -debug-pass is specified alongside the new PM Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D97810	2021-03-02 15:59:28 -08:00
Andrei Elovikov	b24afec8ae	[NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97787	2021-03-02 15:32:10 -08:00
Matt Arsenault	fd82cbcf7d	GlobalISel: Merge and cleanup more AMDGPU call lowering code This merges more AMDGPU ABI lowering code into the generic call lowering. Start cleaning up by factoring away more of the pack/unpack logic into the buildCopy{To\|From}Parts functions. These could use more improvement, and the SelectionDAG versions are significantly more complex, and we'll eventually have to emulate all of those cases too. This is mostly NFC, but does result in some minor instruction reordering. It also removes some of the limitations with mismatched sizes the old code had. However, similarly to the merge on the input, this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we actually want, but SelectionDAG is stuck using the weird emergent ABI). This also changes the load/store size for stack passed EVTs for AArch64, which makes it consistent with the DAG behavior.	2021-03-02 17:31:13 -05:00
Heejin Ahn	4a58116b7e	[WebAssembly] Fix more ExceptionInfo grouping bugs This fixes two bugs in `WebAssemblyExceptionInfo` grouping, created by D97247. These two bugs are not easy to split into two different CLs, because tests that fail for one also tend to fail for the other. - In D97247, when fixing `ExceptionInfo` grouping by taking out the unwind destination' exception from the unwind src's exception, we just iterated the BBs in the function order, but this was incorrect; this changes it to dominator tree preorder. Please refer to the comments in the code for the reason and an example. - After this subexception-taking-out fix, there still can be remaining BBs we have to take out. When Exception B is taken out of Exception A (because EHPad B is the unwind destination of EHPad A), there can still be BBs within Exception A that are reachable from Exception B, which also should be taken out. Please refer to the comments in the code for more detailed explanation on why this can happen. To make this possible, this splits `WebAssemblyException::addBlock` into two parts: adding to a set and adding to a vector. We need to iterate on BBs within a `WebAssemblyException` to fix this, so we add BBs to sets first. But we add BBs to vectors later after we fix all incorrectness because deleting BBs from vectors is expensive. I considered removing the vector from `WebAssemblyException`, but it was not easy because this class has to maintain a similar interface with `MachineLoop` to be wrapped into a single interface `SortRegion`, which is used in CFGSort. Other misc. drive-by fixes: - Make `WebAssemblyExceptionInfo` do not even run when wasm EH is not used or the function doesn't have any EH pads, not to waste time - Add `LLVM_DEBUG` lines for easy debugging - Fix `preds` comments in cfg-stackify-eh.ll - Fix `__cxa_throw`'s signature in cfg-stackify-eh.ll Fixes https://github.com/emscripten-core/emscripten/issues/13554. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97677	2021-03-02 13:44:09 -08:00

1 2 3 4 5 ...

144615 Commits