llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexandre Ganea	934d4feab1	[ThinLTO] Don't rely on debug output for thinlto_samplepgo_icp3 test Because using -print-imports is not thread-safe, make the test rely on llvm-dis instead. Also cover the ICALL-PROM part as intended originally. Differential Revision: https://reviews.llvm.org/D76775	2020-03-25 14:38:20 -04:00
Sanjay Patel	f631b9dc36	[VectorCombine] add shuffle tests; NFC Goes with DD76727.	2020-03-25 10:35:03 -04:00
sstefan1	72b51d6f93	OpenMP] Adding InaccessibleMemOnly and InaccessibleMemOrArgMemOnly for runtime calls. Summary: Attempt to add more attributes for runtime calls. Reviewers: jdoerfert Subscribers: guansong, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75010	2020-03-25 14:08:50 +00:00
Juneyoung Lee	d82c1e8c56	Rename test name, add more tests for codegenprepare	2020-03-25 20:31:12 +09:00
Juneyoung Lee	e951a48996	Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll	2020-03-25 17:29:01 +09:00
Johannes Doerfert	5699d08b79	[Attributor] Use knowledge retained in llvm.assume (operand bundles) This patch integrates operand bundle llvm.assumes [0] with the Attributor. Most IRAttributes will now look at uses of the associated value and if there are llvm.assume operand bundle uses with the right tag we will check if they are in the must-be-executed-context (around the context instruction). Droppable users, which is currently only llvm::assume, are handled special in some places now as well. [0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D74888	2020-03-24 15:33:40 -05:00
Sanjay Patel	c84446f4e9	[VectorCombine] add tests for bitcast (shuffle); NFC	2020-03-24 15:18:32 -04:00
Juneyoung Lee	49f75132bc	[DivRemPairs] Freeze operands if they can be undef values Summary: DivRemPairs is unsound with respect to undef values. ``` // bb1: // %rem = srem %x, %y // bb2: // %div = sdiv %x, %y // --> // bb1: // %div = sdiv %x, %y // %mul = mul %div, %y // %rem = sub %x, %mul ``` If X can be undef, X should be frozen first. For example, let's assume that Y = 1 & X = undef: ``` %div = sdiv undef, 1 // %div = undef %rem = srem undef, 1 // %rem = 0 => %div = sdiv undef, 1 // %div = undef %mul = mul %div, 1 // %mul = undef %rem = sub %x, %mul // %rem = undef - undef = undef ``` http://volta.cs.utah.edu:8080/z/m7Xrx5 Same for Y. If X = 1 and Y = (undef \| 1), %rem in src is either 1 or 0, but %rem in tgt can be one of many integer values. This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 . This miscompilation disappears if undef value is removed, but it may take a while. DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations. Reviewers: spatel, lebedev.ri, george.burgess.iv Reviewed By: spatel Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76483	2020-03-25 03:46:14 +09:00
Sanjay Patel	88b493a838	[ValueTracking] improve undef/poison analysis for constant vectors Differential Revision: https://reviews.llvm.org/D76702	2020-03-24 13:35:47 -04:00
Sanjay Patel	6c3c7a0dd6	[InstSimplify] add tests for freeze(constexpr); NFC	2020-03-24 11:39:19 -04:00
Sanjay Patel	58ec867a3b	[InstSimplify] add more tests for freeze(constant); NFC These should really be moved over to a ConstantFolding test file, but since this may overlap with the in-progress D76010 and similar tests already exist here, we can do that as a later cleanup.	2020-03-24 09:53:49 -04:00
Douglas Yung	18e1a59eed	Fix another instance where a variable was renamed in the generated LLVM IR. [NFC]	2020-03-23 22:53:29 -07:00
Jun Ma	a44de12ab2	[Coroutines] Also check lifetime intrinsic for local variable when build coroutine frame Currently we move all allocas into the frame when build coroutine frame in CoroSplit pass. However, this can be relaxed. Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to do such analysis: If the scope of lifetime intrinsic is not across any suspend point, rather than move the allocas to frame, we can just move them to entry bb of corresponding function. This reduce the frame size. More importantly, this also avoid data race in multithread environment. Consider one inline function by coroutine: it starts a thread which access local variables, while after inline the movement of allocs to frame also access them. cause data race. Differential Revision: https://reviews.llvm.org/D75664	2020-03-24 13:41:55 +08:00
Vedant Kumar	b7cd291c15	[GlobalOpt] Treat null-check of loaded value as use of global (PR35760) PR35760 shows an example program which, when compiled with `clang -O0` or gcc at any optimization level, prints '0'. However, llvm transforms the program in a way that causes it to print '1'. Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when analyzing a load from a global which is used by an `icmp`. This special case was untested [0] so this is just deleting dead code. An alternative fix might be to change the GlobalStatus analysis for the global to report "Stored" instead of "StoredOnce". However, "StoredOnce" is appropriate when only one value other than the initializer is stored to the global. [0] http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662 Differential Revision: https://reviews.llvm.org/D76645	2020-03-23 22:36:09 -07:00
Douglas Yung	e79b1ab65b	Make test more flexible for when the variable is renamed in the generated LLVM IR. [NFC]	2020-03-23 22:03:21 -07:00
Matt Arsenault	66073953a5	AMDGPU: Allow vectorization of round intrinsic There seems to be a small benefit to the legalized sequence for v2f16 round with packed instructions, so allow vectorizing it by reducing the cost. An unintended side effect is vectorization of f32 round also happens. The current FMA logic seems off to me, and isn't checking for packed instructions.	2020-03-23 17:00:41 -04:00
Matt Arsenault	b20a1d840f	GVNSink: Allow handling addrspacecast	2020-03-23 16:50:58 -04:00
Matt Arsenault	43d98a0ecf	Allow replacing intrinsic operands with variables Since intrinsics can now specify when an argument is required to be constant, it is now OK to replace arguments with variables if they aren't. This means intrinsics must now be accurately marked with immarg.	2020-03-23 15:51:57 -04:00
Sanjay Patel	a1fe6beb1e	[InstCombine] remove one-use check for ctpop -> cttz Two one-use checks were added with rGfdcb27105537, but only the first one is necessary to limit an increase in instruction count. The second transform only creates one instruction, so it is always a reasonable canonicalization/optimization.	2020-03-23 13:59:57 -04:00
Johannes Doerfert	9d38f98dc3	[OpenMPOpt] Validate declaration types against the expected types Validation of the found runtime library functions declarations types (return and argument types) with the expected types. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D76058	2020-03-23 11:43:36 -05:00
Johannes Doerfert	68fed27067	[Attributor] Handle calls in AAValueConstantRange properly We did handle calls that were operands of certain instructions but not standalone calls we visit via indirection, e.g., selects.	2020-03-23 10:45:24 -05:00
Johannes Doerfert	54ec9b54f6	[Attributor] Unify handling of must-tail calls We special cased must-tail calls all over the place because they cannot be modified as other calls can be. However, we already centralized the modification API so we can centralize the handling as well. This simplifies the code and allows to remove must-tail calls completely.	2020-03-23 10:45:24 -05:00
Simon Pilgrim	fdcb271055	[InstCombine] Limit CTPOP -> CTTZ simplifications to one use Tweak D76568 so we only combine if it will remove the bit-twiddling. Suggested by @spatel	2020-03-23 14:33:41 +00:00
Florian Hahn	33942d18b1	[SCCP] Precommit additional range propagation test.	2020-03-23 14:15:19 +00:00
Sanjay Patel	5eeea337be	[VectorCombine] add more tests for extract-extract patterns; NFC	2020-03-23 09:33:56 -04:00
Simon Pilgrim	16d2065cfc	[InstCombine] Add ub-safe negation patterns (PR27817)	2020-03-23 12:47:32 +00:00
Florian Hahn	b8a2cf6b5b	[SCCP] Extend test coverage in conditions-ranges.ll to false branches.	2020-03-23 12:32:14 +00:00
Simon Pilgrim	72d1419bfb	[InstCombine] Add CTPOP -> CTTZ simplifications (PR43513) As detailed on PR43513, we can simplify: ctpop(x \| -x) -> bitwidth - cttz(x, false) Alive2: http://volta.cs.utah.edu:8080/z/caw49X ctpop(~x & (x - 1)) -> cttz(x, false) Alive2: http://volta.cs.utah.edu:8080/z/5zfVrx I've tweaked the initial test cases I added at rG2d712fb75584 to increase commutativity testing. Differential Revision: https://reviews.llvm.org/D76568	2020-03-23 11:04:33 +00:00
Juneyoung Lee	5792c2236d	Add test cases that are addressed by D76010	2020-03-23 13:49:29 +09:00
Florian Hahn	006244152d	[SCCP] Add a few more tests for conditional propagation,XOR.	2020-03-22 21:43:33 +00:00
Craig Topper	f4c67dfa92	[X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction. The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now. I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type. Differential Revision: https://reviews.llvm.org/D76478	2020-03-22 14:20:15 -07:00
Nikita Popov	dc81923659	[InstCombine] Remove ExpensiveCombines option D75801 removed the last and only user of this option, so we can drop it now. The original idea behind this was to only run expensive transforms under -O3, but apart from the one known bits transform, this has never really taken off. I believe nowadays the recommendation is to put expensive transforms in AggressiveInstCombine instead, though that isn't terribly popular either :) Differential Revision: https://reviews.llvm.org/D76540	2020-03-22 16:56:28 +01:00
Simon Pilgrim	2d712fb755	[InstCombine] Add ctpop -> cttz combine tests (PR43513)	2020-03-21 19:30:22 +00:00
Huihui Zhang	4f5af9d70d	[ValueTracking] Fix usage of DataLayout::getTypeStoreSize() Summary: DataLayout::getTypeStoreSize() returns TypeSize. For cases where it can not be scalable vector (e.g., GlobalVariable), explicitly call TypeSize::getFixedSize(). For cases where scalable property doesn't matter, (e.g., check for zero-sized type), use TypeSize::isNonZero(). Reviewers: sdesmalen, efriedma, apazos, reames Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76454	2020-03-20 16:52:15 -07:00
Huihui Zhang	1993f95f2b	[ValueTracking][SVE] Fix getOffsetFromIndex for scalable vector. Summary: Return None if GEP index type is scalable vector. Size of scalable vectors are multiplied by a runtime constant. Avoid transforming: %a = bitcast i8* %p to <vscale x 16 x i8>* %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0 store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp0 %tmp1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 1 store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp1 into: %a = bitcast i8* %p to <vscale x 16 x i8>* %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0 %1 = bitcast <vscale x 16 x i8>* %tmp0 to i8* call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 false) Reviewers: sdesmalen, efriedma, apazos, reames Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, arphaman, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76464	2020-03-20 14:48:29 -07:00
Nikita Popov	2b52e4e629	[InstCombine] Remove known bits constant folding If ExpensiveCombines is enabled (which is the case with -O3 on the legacy PM and always on the new PM), InstCombine tries to compute the known bits of all instructions in the hope that all bits end up being known, which is fairly expensive. How effective is it? If we add some statistics on how often the constant folding succeeds and how many KnownBits calculations are performed and run test-suite we get: "instcombine.NumConstPropKnownBits": 642, "instcombine.NumConstPropKnownBitsComputed": 18744965, In other words, we get one fold for every 30000 KnownBits calculations. However, the truth is actually much worse: Currently, known bits are computed before performing other folds, so there is a high chance that cases that get folded by known bits would also have been handled by other folds. What happens if we compute known bits after all other folds (hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)? "instcombine.NumConstPropKnownBits": 0, "instcombine.NumConstPropKnownBitsComputed": 18105547, So it turns out despite doing 18 million known bits calculations, the known bits fold does not do anything useful on test-suite. I was originally planning to move this into AggressiveInstCombine so it only runs once in the pipeline, but seeing this, I think we're better off removing it entirely. As this is the only use of the "expensive combines" mechanism, it may be removed afterwards, but I'll leave that to a separate patch. Differential Revision: https://reviews.llvm.org/D75801	2020-03-20 20:54:06 +01:00
Nikita Popov	3205d1a860	[InstCombine] Handle known shl nsw sign bit in SimplifyDemanded Ideally SimplifyDemanded should compute the same known bits as computeKnownBits(). This patch addresses one discrepancy, where ValueTracking is more powerful: If we have a shl nsw shift, we know that the sign bit of the input and output must be the same. If this results in a conflict, the result is poison. This is implemented in `2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)` and `2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908)`. This implements the same basic logic in SimplifyDemanded. It's slightly stronger, because I return undef instead of zero for the poison case (which is not an option inside ValueTracking). As mentioned in https://reviews.llvm.org/D75801#inline-698484, we could detect poison in more cases, this just establishes parity with the existing logic. Differential Revision: https://reviews.llvm.org/D76489	2020-03-20 18:16:05 +01:00
Simon Pilgrim	34659de5fd	[InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391) The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range. This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).	2020-03-20 15:48:06 +00:00
Florian Hahn	ece6cf0fa5	[DSE,MSSA] Precommit additional tests for D73763.	2020-03-20 13:39:46 +00:00
Simon Pilgrim	7f764fa18f	[ValueTracking] Add some initial isKnownNonZero DemandedElts support (PR36319)	2020-03-20 13:29:00 +00:00
Nikita Popov	ce6c95aaca	[InstCombine] Move test to instcombine; NFC This test uses -instcombine, so move it into the appropriate directory. Also fork it for expensive checks enabled/disabled.	2020-03-20 12:41:19 +01:00
Simon Pilgrim	c1efdbcbe0	[ValueTracking] Add computeKnownBits DemandedElts support to shift instructions (PR36319)	2020-03-20 11:08:08 +00:00
Nikita Popov	a09ff56b5b	[Tests] Regenerate some test checks; NFC	2020-03-20 12:06:53 +01:00
Nikita Popov	0372768776	[InstCombine] Simplify calls with "returned" attribute If a call argument has the "returned" attribute, we can simplify the call to the value of that argument. This was already partially handled by InstSimplify/InstCombine for the case where the argument is an integer constant, and the result is thus known via known bits. The non-constant (or non-int) argument cases weren't handled though. This previously landed as an InstSimplify transform, but was reverted due to assertion failures when compiling the Linux kernel. The reason is that simplifying a call to another call breaks assumptions in call graph updating during inlining. As the code is not easy to fix, and there is no particularly strong motivation for having this in InstSimplify, the transform is only performed in InstCombine instead. Differential Revision: https://reviews.llvm.org/D75815	2020-03-20 10:23:39 +01:00
Nikita Popov	5c10967157	[InstCombine] Don't replace musttail result based on known bits This is the same change as D75824, but for two cases where InstCombine performs the same optimization: Replacing an instruction whose bits are fully known with a constant. This is not (generally) legal for musttail calls. Differential Revision: https://reviews.llvm.org/D76457	2020-03-20 10:17:09 +01:00
Florian Hahn	3a8372ed02	[DSE] Support traversing MemoryPhis. For MemoryPhis, we have to avoid that the MemoryPhi may be executed before before the access we are currently looking at. To do this we do a post-order numbering of the basic blocks in the function and bail out once we reach a MemoryPhi with a larger (or equal) post-order block number than the current MemoryAccess. This changes the order in which we visit stores for elimination. This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration. For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access. Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72148	2020-03-20 07:51:42 +00:00
Jun Ma	032251e34d	[Coroutines] Fix PR45130 For now, when final suspend can be simplified by simplifySuspendPoint, handleFinalSuspend is executed as well to remove last case in switch instruction. This patch fixes it. Differential Revision: https://reviews.llvm.org/D76345	2020-03-20 11:27:08 +08:00
Simon Pilgrim	95b6f62efb	[InstSimplify] Add some vector shift tests to show lack of DemandedElts support	2020-03-19 22:09:51 +00:00
Kazu Hirata	e23d786526	[JumpThreading] Fix infinite loop (PR44611) Summary: This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by preventing an infinite loop in the jump threading pass when -jump-threading-across-loop-headers is on. Specifically, without this patch, jump threading through two basic blocks would trigger on the same area of the CFG over and over, resulting in an infinite loop. Consider testcase PR44611-across-header-hang.ll in this patch. The first opportunity to thread through two basic blocks is: from bb_body2 through bb_header and bb_body1 to bb_body2. The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1 and bb_body1.thread1. Since bb_header contains a successor edge back to itself, bb_header.thread1 also contains a successor edge to bb_header, immediately giving rise to the next jump threading opportunity: from bb_header.thread1 through bb_header and bb_body1 to bb_body2. After that, we repeatedly thread an incoming edge into bb_header through bb_header and bb_body1 to bb_body2. In other words, we keep peeling one iteration from bb_header's self loop. The patch fixes the problem by preventing the pass from duplicating a basic block containing a self loop. Reviewers: wmi, junparser, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76390	2020-03-19 12:49:36 -07:00
Simon Pilgrim	c2586cab89	[InstCombine][X86] Tests for variable but in-range vector-by-scalar shift amounts (PR40391) These shifts are masked to be inrange so we should be able to replace them with generic shifts.	2020-03-19 19:24:55 +00:00

1 2 3 4 5 ...

14542 Commits