llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Vladislav Vinogradov	15b76e6ca0	[mlir][ODS] Fix `VariadicRegion` code generation for `NoTerminator` Ops The issue was introduced in D98468. The `{0}Regions` is an array of `std::unique_ptr<Region>` objects, so it should be processed accordingly. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D99332	2021-03-26 14:24:36 +03:00
Abhina Sreeskantharajan	bc5d4bcc2d	[Windows] Turn off text mode in TableGen and Rewriter to stop CRLF translation This patch should fix the errors shown on the Windows bots by turning off text mode. I plan to investigate a better fix but this should unblock the buildbots for now. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99363	2021-03-26 07:12:46 -04:00
Max Kazantsev	6a7bcc9c8d	[Test] Add failing test for pr49730	2021-03-26 18:03:39 +07:00
Muhammad Omair Javaid	73cf85e527	[LLDB] Skip TestVSCode_disconnect.test_launch arm/linux TestVSCode_disconnect.test_launch hangs in tear down and times out Arm linux. I am marking it skipped for the buildbot while looking into failure.	2021-03-26 15:54:42 +05:00
Jay Foad	d92b4956d6	[AMDGPU] Inline FSHRPattern into its only use. NFC.	2021-03-26 09:32:02 +00:00
Fangrui Song	dc46783f7f	[memprof][test] Make test_terse.cpp robust (sched_getcpu may happens to change) ``` /b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/memprof/TestCases/test_terse.cpp:11:11: error: CHECK: expected string not found in input // CHECK: MIB:[[STACKID:[0-9]+]]/1/40.00/40/40/20.00/20/20/[[AVELIFETIME:[0-9]+]].00/[[AVELIFETIME]]/[[AVELIFETIME]]/0/0/0/0 ^ <stdin>:1:1: note: scanning from here MIB:StackID/AllocCount/AveSize/MinSize/MaxSize/AveAccessCount/MinAccessCount/MaxAccessCount/AveLifetime/MinLifetime/MaxLifetime/NumMigratedCpu/NumLifetimeOverlaps/NumSameAllocCpu/NumSameDeallocCpu ^ <stdin>:4:1: note: possible intended match here MIB:134217729/1/40.00/40/40/20.00/20/20/7.00/7/7/1/0/0/0 ```	2021-03-26 00:45:58 -07:00
Craig Topper	8f62a80328	[RISCV] Optimize (and (shl GPR:, uimm5:), 0xffffffff) to use 2 shifts instead of 3. The and would normally become SLLI+SRLI, giving us 2 SLLI+SRLI. We can detect this and combine the 2 SLLIs into 1.	2021-03-25 23:31:01 -07:00
Craig Topper	5a18c576c4	[RISCV] Don't call CheckAndMask from selectZExti32. Now that targetShrinkDemandedConstant preserves 0xffffffff masks we shouldn't need to call computeKnownBits here.	2021-03-25 22:07:41 -07:00
Fangrui Song	9be8f8b34d	[sanitizer] Simplify GetTls with dl_iterate_phdr GetTls is the range of * thread control block and optional TLS_PRE_TCB_SIZE * static TLS blocks plus static TLS surplus On glibc, lsan requires the range to include `pthread::{specific_1stblock,specific}` so that allocations only referenced by `pthread_setspecific` can be scanned. This patch uses `dl_iterate_phdr` to collect TLS ranges. Find the one with `dlpi_tls_modid==1` as one of the initially loaded module, then find consecutive ranges. The boundaries give us addr and size. This allows us to drop the glibc internal `_dl_get_tls_static_info` and `InitTlsSize` entirely. Use the simplified method with non-Android Linux for now, but in theory this can be used with *BSD and potentially other ELF OSes. In the future, we can move `ThreadDescriptorSize` code to lsan (and consider intercepting `pthread_setspecific`) to avoid hacks in generic code. See https://reviews.llvm.org/D93972#2480556 for analysis on GetTls usage across various sanitizers. Differential Revision: https://reviews.llvm.org/D98926	2021-03-25 21:55:27 -07:00
Kazu Hirata	9d375a40c3	Reapply [InlineCost] Enable the cost benefit analysis on FDO This patch enables the cost-benefit-analysis-based inliner by default if we have instrumentation profile. - SPEC CPU 2017 shows a 0.4% improvement. - An internal large benchmark shows a 0.9% reduction in the cycle count along with 14.6% reduction in the number of call instructions executed. Differential Revision: https://reviews.llvm.org/D98213	2021-03-25 21:51:38 -07:00
Kazu Hirata	3c775d93a1	[InlineCost] Reject a zero entry count This patch teaches the cost-benefit-analysis-based inliner to reject a zero entry count so that we don't trigger a divide-by-zero.	2021-03-25 21:51:36 -07:00
Suraj Sudhir	ec46e03daf	[mlir][tosa] TOSA MLIR dialect update to v0.22, part 1 Incremental set of updates to align to TOSA v0.22 spec - modify gather, resize - add scatter - remove aint8 type Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D99390	2021-03-25 21:34:34 -07:00
Wenlei He	5f59f407f5	[CSSPGO] Minor tweak for inline candidate priority tie breaker When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important. Differential Revision: https://reviews.llvm.org/D99370	2021-03-25 21:15:36 -07:00
Tony	850fcedb27	[NFC][AMDGPU] Corrections to AMD GPU initial kernel launch documentation Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D99223	2021-03-26 02:05:45 +00:00
Lang Hames	19e402d2b3	[JITLink][MachO] Use full <segment>,<section> names for MachO jitlink::Sections. JITLink now requires section names to be unique. In MachO section names are only guaranteed to be unique within their containing segment (e.g. a '__const' section in the '__DATA' segment does not clash with a '__const' section in the '__TEXT' segment), so we need to use the fully qualified <segment>,<section> section names (e.g. '__DATA,__const' or '__TEXT,__const') when constructing jitlink::Sections for MachO objects.	2021-03-25 18:31:18 -07:00
Stella Laurenzo	594e0ba969	[mlir][python] Add docs for op class extension mechanism. Differential Revision: https://reviews.llvm.org/D99387	2021-03-25 18:27:26 -07:00
Richard Smith	4f3ea27dac	Stop this test from dropping a .s file in the current directory.	2021-03-25 18:22:18 -07:00
Richard Smith	ed8d76ec60	Explicitly enable the new pass manager in this test. Otherwise it fails under -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF.	2021-03-25 18:10:36 -07:00
Craig Topper	9b3c0f9a54	[RISCV] Add Zbb+Zbt command lines to the signed saturing add/sub tests. This will enable cmov to be used for select. I improve the codegen of select_cc in D99021, but that patch doesn't work for cmov.	2021-03-25 17:25:36 -07:00
Amara Emerson	55533203d7	[GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates. Differential Revision: https://reviews.llvm.org/D99383	2021-03-25 17:23:30 -07:00
Jessica Paquette	23f657c165	[AArch64][GlobalISel] Emit bzero on Darwin Darwin platforms for both AArch64 and X86 can provide optimized `bzero()` routines. In this case, it may be preferable to use `bzero` in place of a memset of 0. This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can be generated by platforms which may want to use bzero. To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The conditions for this are largely a port of the bzero case in `AArch64SelectionDAGInfo::EmitTargetCodeForMemset`. The only difference in comparison to the SelectionDAG code is that, when compiling for minsize, this will fire for all memsets of 0. The original code notes that it's not beneficial to do this for small memsets; however, using bzero here will save a mov from wzr. For minsize, I think that it's preferable to prioritise omitting the mov. This also fixes a bug in the libcall legalization code which would delete instructions which could not be legalized. It also adds a check to make sure that we actually get a libcall name. Code size improvements (Darwin): - CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign) - CTMark -Oz: -0.2% geomean (-0.5% on bullet) Differential Revision: https://reviews.llvm.org/D99358	2021-03-25 17:14:25 -07:00
Richard Smith	11bf268864	Add a target triple to fix test failure on targets that don't support __int128.	2021-03-25 17:05:36 -07:00
Richard Smith	040c60d9b6	Fix a miscompile introduced by `99203f2`. getPointersDiff would previously round down the difference between two pointers to a multiple of the element size of the pointee, which could result in a pointer value being decreased a little. Alexey Bataev has graciously agreed to add a testcase for this; submitting the bugfix now to unblock.	2021-03-25 16:53:58 -07:00
Rahman Lavaee	cf62b6d3b2	Add missing 'CHECK' prefix to basic block labels test. The `CHECK` prefix was dropped in `e0bf234930`. This lead to all CHECK lines having no effect. Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D99316	2021-03-25 16:41:41 -07:00
Muhammad Omair Javaid	c3152536fd	[LLDB] Skip TestVSCode_launch.test_progress_events arm/linux TestVSCode_launch.test_progress_events is mysteriously failing on arm linux. I am marking it skipped for the buildbot while looking into failure.	2021-03-26 04:38:31 +05:00
Fangrui Song	ed956554f9	[Triple][Driver] Add muslx32 environment and use /lib/ld-musl-x32.so.1 for -dynamic-linker Differential Revision: https://reviews.llvm.org/D99308	2021-03-25 16:25:47 -07:00
Yonghong Song	886f9ff531	BPF: add extern func to data sections if specified This permits extern function (BTF_KIND_FUNC) be added to BTF_KIND_DATASEC if a section name is specified. For example, -bash-4.4$ cat t.c void foo(int) __attribute__((section(".kernel.funcs"))); int test(void) { foo(5); return 0; } The extern function foo (BTF_KIND_FUNC) will be put into BTF_KIND_DATASEC with name ".kernel.funcs". This will help to differentiate two kinds of external functions, functions in kernel and functions defined in other bpf programs. Differential Revision: https://reviews.llvm.org/D93563	2021-03-25 16:03:29 -07:00
Jingu Kang	3fd64cc7a3	[ValueTracking] Handle two PHIs in isKnownNonEqual() loop: %cmp.0 = phi i32 [ 3, %entry ], [ %inc, %loop ] %pos.0 = phi i32 [ 1, %entry ], [ %cmp.0, %loop ] ... %inc = add i32 %cmp.0, 1 br label %loop On above example, %pos.0 uses previous iteration's %cmp.0 with backedge according to PHI's instruction's defintion. If the %inc is not same among iterations, we can say the two PHIs are not same. Differential Revision: https://reviews.llvm.org/D98422	2021-03-25 22:56:05 +00:00
Jonas Devlieghere	bbb419151c	[lldb] Add IsFullyInitialized to DynamicLoader On Darwin based systems, lldb will get notified by dyld before it itself finished initializing, at which point it's not safe to call certain APIs or SPIs. Add a method to the DynamicLoader to query that. Differential revision: https://reviews.llvm.org/D99314	2021-03-25 15:44:37 -07:00
Leonard Chan	36eaeaf728	[llvm][hwasan] Add Fuchsia shadow mapping configuration Ensure that Fuchsia shadow memory starts at zero. Differential Revision: https://reviews.llvm.org/D99380	2021-03-25 15:28:59 -07:00
Stella Laurenzo	ec294eb87b	[mlir][linalg] Add an InitTensorOp python builder. * This has the API I want but I am not thrilled with the implementation. There are various things that could be improved both about the way that Python builders are mapped and the way the Linalg ops are factored to increase code sharing between C++/Python. * Landing this as-is since it at least makes the InitTensorOp usable with the right API. Will refactor underneath in follow-ons. Differential Revision: https://reviews.llvm.org/D99000	2021-03-25 15:17:48 -07:00
Guozhi Wei	3240910f00	[DAE] Adjust param/arg attributes when changing parameter to undef In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code. To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg. Differential Revision: https://reviews.llvm.org/D98899	2021-03-25 14:53:22 -07:00
Philip Reames	4f5e92cc05	Mark gc.relocate and gc.result as readnone (try 2) As noted in the LangRef, these are semantically readnone projections from the result value of the associated statepoint. However, it turned out we had a few latent bugs being covered up by the fact we were only marking them readonly (see PR49607 for context). As of this change, all known issues are resolved. This is a deliberately minimal patch to make it easy to test downstream and revert with minimal change if that turns out to be necessary. Differential Revision: https://reviews.llvm.org/D98729	2021-03-25 14:50:07 -07:00
Philip Reames	e7ebb87222	[deref] Handle byval/byref/sret/inalloc/preallocated arguments for deref-at-point semantics All of these are scoped allocations which remain dereferenceable during the lifetime of the callee. Differential Revision: https://reviews.llvm.org/D99310	2021-03-25 14:47:31 -07:00
Philip Reames	67e28173f1	Autogen test to account for tool output format change	2021-03-25 14:41:08 -07:00
Philip Reames	88d0f47b4f	[test] Add test for hoisting to custom allocation function using allocsize The first is currently demonstrating a miscompile.	2021-03-25 14:31:51 -07:00
David Stone	4b5baa5b82	Handle 128-bits IntegerLiterals in StmtPrinter This fixes PR35677: "int128_t or uint128_t as non-type template parameter causes crash when considering invalid constructor".	2021-03-25 17:27:13 -04:00
Vedant Kumar	414412d3dc	[lldb/Commands] Fix spelling of target.move-to-nearest-code in helptext	2021-03-25 14:25:10 -07:00
Matt Morehouse	8e0bb21931	[HWASan] Mention x86_64 aliasing mode in design doc. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D98892	2021-03-25 14:22:20 -07:00
Craig Topper	5797feaa55	[RISCV] Reorder checks in RISCVTTIImpl::getGatherScatterOpCost to avoid calling getMinRVVVectorSizeInBits() when V extension is not enabled. getMinRVVVectorSizeInBits() asserts if the V extension isn't enabled. So check that gather/scatter is legal first since it already contains a check for V extension being enabled. It also already checks getMinRVVVectorSizeInBits for fixed length vectors so we don't need a check in getGatherScatterOpCost.	2021-03-25 14:20:47 -07:00
Andrew Savonichev	bba25a9cd8	[MCA] Support carry-over instructions for in-order processors Instructions that have more uops than the processor's IssueWidth are issued in multiple cycles. The patch fixes PR49712. Differential Revision: https://reviews.llvm.org/D99339	2021-03-26 00:06:19 +03:00
Xun Li	f490a5969b	[OpenMP][InstrProfiling] Fix a missing instr profiling counter When emitting a function body there needs to be a instr profiling counter emitted. Otherwise instr profiling won't work for this function. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D98135	2021-03-25 13:52:36 -07:00
Richard Smith	622f8de4f2	PR49724: Fix deduction of null member pointers. Previously we created an implicit cast of the wrong kind, which we'd later fail to constant-evaluate, resulting in deduction failure.	2021-03-25 13:47:22 -07:00
Vy Nguyen	dee5787d3e	Reland [lld-macho][nfc] minor clean up, follow up to D98559 This reverts commit `77b4230ed9`. New change: Fixed tests on windows Differential Revision: https://reviews.llvm.org/D99210	2021-03-25 16:46:37 -04:00
Xun Li	c7a39c833a	[Coroutine][Clang] Force emit lifetime intrinsics for Coroutines tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available. Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame). The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame. In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap. Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime. To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point. The following is a common code pattern called "Symmetric Transfer" in coroutine: ``` auto tmp = await_suspend(); __builtin_coro_resume(tmp.address()); return; ``` In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine. During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards. However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape. To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived. I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines. Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893 Differential Revision: https://reviews.llvm.org/D99227	2021-03-25 13:46:20 -07:00
Nico Weber	a60ffee3f4	Revert "[InlineCost] Enable the cost benefit analysis on FDO" This reverts commit `ef69aa961d`. Makes clang assert in PGO builds, see repro tgz in https://bugs.chromium.org/p/chromium/issues/detail?id=1192783#c6	2021-03-25 16:42:19 -04:00
Leonard Chan	1abaadb30d	[clang][driver] Support HWASan in the Fuchsia toolchain These contain clang driver changes for supporting HWASan on Fuchsia. This includes hwasan multilibs and the dylib path change. Differential Revision: https://reviews.llvm.org/D99361	2021-03-25 13:36:23 -07:00
Roman Lebedev	1c55dcbca7	[NFCI][SimplifyCFG] Don't pay for a Small{Map,Set}Vector when plain SmallSet will suffice This only changes the cases where we really don't care about the iteration order of the underlying contained, namely when we will use the values from it to form DTU updates.	2021-03-25 23:25:40 +03:00

... 2 3 4 5 6 ...

384030 Commits All Branches Search

384030 Commits

All Branches