Commit Graph

14542 Commits

Author SHA1 Message Date
Alexandre Ganea 934d4feab1 [ThinLTO] Don't rely on debug output for thinlto_samplepgo_icp3 test
Because using -print-imports is not thread-safe, make the test rely on llvm-dis instead.
Also cover the ICALL-PROM part as intended originally.

Differential Revision: https://reviews.llvm.org/D76775
2020-03-25 14:38:20 -04:00
Sanjay Patel f631b9dc36 [VectorCombine] add shuffle tests; NFC
Goes with DD76727.
2020-03-25 10:35:03 -04:00
sstefan1 72b51d6f93 OpenMP] Adding InaccessibleMemOnly and InaccessibleMemOrArgMemOnly for runtime calls.
Summary: Attempt to add more attributes for runtime calls.

Reviewers: jdoerfert

Subscribers: guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75010
2020-03-25 14:08:50 +00:00
Juneyoung Lee d82c1e8c56 Rename test name, add more tests for codegenprepare 2020-03-25 20:31:12 +09:00
Juneyoung Lee e951a48996 Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll 2020-03-25 17:29:01 +09:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Sanjay Patel c84446f4e9 [VectorCombine] add tests for bitcast (shuffle); NFC 2020-03-24 15:18:32 -04:00
Juneyoung Lee 49f75132bc [DivRemPairs] Freeze operands if they can be undef values
Summary:
DivRemPairs is unsound with respect to undef values.

```
      // bb1:
      //   %rem = srem %x, %y
      // bb2:
      //   %div = sdiv %x, %y
      // -->
      // bb1:
      //   %div = sdiv %x, %y
      //   %mul = mul %div, %y
      //   %rem = sub %x, %mul
```

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:
```
   %div = sdiv undef, 1 // %div = undef
   %rem = srem undef, 1 // %rem = 0
 =>
   %div = sdiv undef, 1 // %div = undef
   %mul = mul %div, 1   // %mul = undef
   %rem = sub %x, %mul  // %rem = undef - undef = undef
```
http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Reviewers: spatel, lebedev.ri, george.burgess.iv

Reviewed By: spatel

Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76483
2020-03-25 03:46:14 +09:00
Sanjay Patel 88b493a838 [ValueTracking] improve undef/poison analysis for constant vectors
Differential Revision: https://reviews.llvm.org/D76702
2020-03-24 13:35:47 -04:00
Sanjay Patel 6c3c7a0dd6 [InstSimplify] add tests for freeze(constexpr); NFC 2020-03-24 11:39:19 -04:00
Sanjay Patel 58ec867a3b [InstSimplify] add more tests for freeze(constant); NFC
These should really be moved over to a ConstantFolding test file,
but since this may overlap with the in-progress D76010 and similar
tests already exist here, we can do that as a later cleanup.
2020-03-24 09:53:49 -04:00
Douglas Yung 18e1a59eed Fix another instance where a variable was renamed in the generated LLVM IR. [NFC] 2020-03-23 22:53:29 -07:00
Jun Ma a44de12ab2 [Coroutines] Also check lifetime intrinsic for local variable when build
coroutine frame

Currently we move all allocas into the frame when build coroutine frame in
CoroSplit pass. However, this can be relaxed.

Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to
do such analysis: If the scope of lifetime intrinsic is not across any suspend
point, rather than move the allocas to frame, we can just move them to entry bb
of corresponding function. This reduce the frame size.

More importantly, this also avoid data race in multithread environment.
Consider one inline function by coroutine: it starts a thread which access
local variables, while after inline the movement of allocs to frame also access
them. cause data race.

Differential Revision: https://reviews.llvm.org/D75664
2020-03-24 13:41:55 +08:00
Vedant Kumar b7cd291c15 [GlobalOpt] Treat null-check of loaded value as use of global (PR35760)
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.

Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.

An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.

[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662

Differential Revision: https://reviews.llvm.org/D76645
2020-03-23 22:36:09 -07:00
Douglas Yung e79b1ab65b Make test more flexible for when the variable is renamed in the generated LLVM IR. [NFC] 2020-03-23 22:03:21 -07:00
Matt Arsenault 66073953a5 AMDGPU: Allow vectorization of round intrinsic
There seems to be a small benefit to the legalized sequence for v2f16
round with packed instructions, so allow vectorizing it by reducing
the cost.

An unintended side effect is vectorization of f32 round also
happens. The current FMA logic seems off to me, and isn't checking for
packed instructions.
2020-03-23 17:00:41 -04:00
Matt Arsenault b20a1d840f GVNSink: Allow handling addrspacecast 2020-03-23 16:50:58 -04:00
Matt Arsenault 43d98a0ecf Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Sanjay Patel a1fe6beb1e [InstCombine] remove one-use check for ctpop -> cttz
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
2020-03-23 13:59:57 -04:00
Johannes Doerfert 9d38f98dc3 [OpenMPOpt] Validate declaration types against the expected types
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76058
2020-03-23 11:43:36 -05:00
Johannes Doerfert 68fed27067 [Attributor] Handle calls in AAValueConstantRange properly
We did handle calls that were operands of certain instructions but not
standalone calls we visit via indirection, e.g., selects.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 54ec9b54f6 [Attributor] Unify handling of must-tail calls
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
2020-03-23 10:45:24 -05:00
Simon Pilgrim fdcb271055 [InstCombine] Limit CTPOP -> CTTZ simplifications to one use
Tweak D76568 so we only combine if it will remove the bit-twiddling.

Suggested by @spatel
2020-03-23 14:33:41 +00:00
Florian Hahn 33942d18b1 [SCCP] Precommit additional range propagation test. 2020-03-23 14:15:19 +00:00
Sanjay Patel 5eeea337be [VectorCombine] add more tests for extract-extract patterns; NFC 2020-03-23 09:33:56 -04:00
Simon Pilgrim 16d2065cfc [InstCombine] Add ub-safe negation patterns (PR27817) 2020-03-23 12:47:32 +00:00
Florian Hahn b8a2cf6b5b [SCCP] Extend test coverage in conditions-ranges.ll to false branches. 2020-03-23 12:32:14 +00:00
Simon Pilgrim 72d1419bfb [InstCombine] Add CTPOP -> CTTZ simplifications (PR43513)
As detailed on PR43513, we can simplify:

ctpop(x | -x) -> bitwidth - cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/caw49X

ctpop(~x & (x - 1)) -> cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/5zfVrx

I've tweaked the initial test cases I added at rG2d712fb75584 to increase commutativity testing.

Differential Revision: https://reviews.llvm.org/D76568
2020-03-23 11:04:33 +00:00
Juneyoung Lee 5792c2236d Add test cases that are addressed by D76010 2020-03-23 13:49:29 +09:00
Florian Hahn 006244152d [SCCP] Add a few more tests for conditional propagation,XOR. 2020-03-22 21:43:33 +00:00
Craig Topper f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Nikita Popov dc81923659 [InstCombine] Remove ExpensiveCombines option
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)

Differential Revision: https://reviews.llvm.org/D76540
2020-03-22 16:56:28 +01:00
Simon Pilgrim 2d712fb755 [InstCombine] Add ctpop -> cttz combine tests (PR43513) 2020-03-21 19:30:22 +00:00
Huihui Zhang 4f5af9d70d [ValueTracking] Fix usage of DataLayout::getTypeStoreSize()
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.

For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().

For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().

Reviewers: sdesmalen, efriedma, apazos, reames

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76454
2020-03-20 16:52:15 -07:00
Huihui Zhang 1993f95f2b [ValueTracking][SVE] Fix getOffsetFromIndex for scalable vector.
Summary:
Return None if GEP index type is scalable vector. Size of scalable vectors
are multiplied by a runtime constant.

Avoid transforming:
  %a = bitcast i8* %p to <vscale x 16 x i8>*
  %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
  store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp0
  %tmp1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 1
  store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp1

into:
  %a = bitcast i8* %p to <vscale x 16 x i8>*
  %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
  %1 = bitcast <vscale x 16 x i8>* %tmp0 to i8*
  call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 false)

Reviewers: sdesmalen, efriedma, apazos, reames

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, arphaman, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76464
2020-03-20 14:48:29 -07:00
Nikita Popov 2b52e4e629 [InstCombine] Remove known bits constant folding
If ExpensiveCombines is enabled (which is the case with -O3 on the
legacy PM and always on the new PM), InstCombine tries to compute
the known bits of all instructions in the hope that all bits end up
being known, which is fairly expensive.

How effective is it? If we add some statistics on how often the
constant folding succeeds and how many KnownBits calculations are
performed and run test-suite we get:

    "instcombine.NumConstPropKnownBits": 642,
    "instcombine.NumConstPropKnownBitsComputed": 18744965,

In other words, we get one fold for every 30000 KnownBits calculations.
However, the truth is actually much worse: Currently, known bits are
computed before performing other folds, so there is a high chance
that cases that get folded by known bits would also have been
handled by other folds.

What happens if we compute known bits after all other folds
(hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?

    "instcombine.NumConstPropKnownBits": 0,
    "instcombine.NumConstPropKnownBitsComputed": 18105547,

So it turns out despite doing 18 million known bits calculations,
the known bits fold does not do anything useful on test-suite.
I was originally planning to move this into AggressiveInstCombine
so it only runs once in the pipeline, but seeing this, I think
we're better off removing it entirely.

As this is the only use of the "expensive combines" mechanism,
it may be removed afterwards, but I'll leave that to a separate patch.

Differential Revision: https://reviews.llvm.org/D75801
2020-03-20 20:54:06 +01:00
Nikita Popov 3205d1a860 [InstCombine] Handle known shl nsw sign bit in SimplifyDemanded
Ideally SimplifyDemanded should compute the same known bits as
computeKnownBits(). This patch addresses one discrepancy, where
ValueTracking is more powerful: If we have a shl nsw shift, we
know that the sign bit of the input and output must be the same.
If this results in a conflict, the result is poison.

This is implemented in
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)
and
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908).

This implements the same basic logic in SimplifyDemanded. It's
slightly stronger, because I return undef instead of zero for the
poison case (which is not an option inside ValueTracking).

As mentioned in https://reviews.llvm.org/D75801#inline-698484,
we could detect poison in more cases, this just establishes parity
with the existing logic.

Differential Revision: https://reviews.llvm.org/D76489
2020-03-20 18:16:05 +01:00
Simon Pilgrim 34659de5fd [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391)
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.

This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
2020-03-20 15:48:06 +00:00
Florian Hahn ece6cf0fa5 [DSE,MSSA] Precommit additional tests for D73763. 2020-03-20 13:39:46 +00:00
Simon Pilgrim 7f764fa18f [ValueTracking] Add some initial isKnownNonZero DemandedElts support (PR36319) 2020-03-20 13:29:00 +00:00
Nikita Popov ce6c95aaca [InstCombine] Move test to instcombine; NFC
This test uses -instcombine, so move it into the appropriate
directory. Also fork it for expensive checks enabled/disabled.
2020-03-20 12:41:19 +01:00
Simon Pilgrim c1efdbcbe0 [ValueTracking] Add computeKnownBits DemandedElts support to shift instructions (PR36319) 2020-03-20 11:08:08 +00:00
Nikita Popov a09ff56b5b [Tests] Regenerate some test checks; NFC 2020-03-20 12:06:53 +01:00
Nikita Popov 0372768776 [InstCombine] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.

This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-20 10:23:39 +01:00
Nikita Popov 5c10967157 [InstCombine] Don't replace musttail result based on known bits
This is the same change as D75824, but for two cases where
InstCombine performs the same optimization: Replacing an instruction
whose bits are fully known with a constant. This is not (generally)
legal for musttail calls.

Differential Revision: https://reviews.llvm.org/D76457
2020-03-20 10:17:09 +01:00
Florian Hahn 3a8372ed02 [DSE] Support traversing MemoryPhis.
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.

To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.

This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration.  For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72148
2020-03-20 07:51:42 +00:00
Jun Ma 032251e34d [Coroutines] Fix PR45130
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.

Differential Revision: https://reviews.llvm.org/D76345
2020-03-20 11:27:08 +08:00
Simon Pilgrim 95b6f62efb [InstSimplify] Add some vector shift tests to show lack of DemandedElts support 2020-03-19 22:09:51 +00:00
Kazu Hirata e23d786526 [JumpThreading] Fix infinite loop (PR44611)
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on.  Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.

Consider testcase PR44611-across-header-hang.ll in this patch.  The
first opportunity to thread through two basic blocks is:

  from bb_body2 through bb_header and bb_body1 to bb_body2.

The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1.  Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:

  from bb_header.thread1 through bb_header and bb_body1 to bb_body2.

After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2.  In other words, we keep
peeling one iteration from bb_header's self loop.

The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.

Reviewers: wmi, junparser, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76390
2020-03-19 12:49:36 -07:00
Simon Pilgrim c2586cab89 [InstCombine][X86] Tests for variable but in-range vector-by-scalar shift amounts (PR40391)
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
2020-03-19 19:24:55 +00:00