Commit Graph

14653 Commits

Author SHA1 Message Date
Serge Pavlov f398739152 [FEnv] Constfold some unary constrained operations
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.

Differential Revision: https://reviews.llvm.org/D72930
2020-03-28 12:28:33 +07:00
Sanjay Patel 0f56bbc1a5 [InstCombine] reduce FP-casted and bitcasted signbit check
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305

Alive2 proofs:
http://volta.cs.utah.edu:8080/z/bVyrko
http://volta.cs.utah.edu:8080/z/Vxpz9q
2020-03-27 17:33:59 -04:00
Sanjay Patel e72730ee3a [InstCombine] add tests for FP cast+bitcast signbit checks; NFC
PR45305:
https://bugs.llvm.org/show_bug.cgi?id=45305
2020-03-27 17:25:25 -04:00
Simon Pilgrim f4f4a8bfef [InstCombine][X86] Add repeated ops demanded elts tests for SSE intrinsics (PR24523) 2020-03-27 14:51:09 +00:00
Simon Pilgrim ec3bb6c3e7 [InstCombine][X86] Regenerate SSE2 tests 2020-03-27 14:51:09 +00:00
Sanjay Patel 5237262feb [InstCombine] add shuffle-with-bitcast-operand tests; NFC 2020-03-26 14:28:47 -04:00
Jonathan Roelofs 7a89a5d81b [InstCombine] Fix Incorrect fold of ashr+xor -> lshr w/ vectors
Fixes https://bugs.llvm.org/show_bug.cgi?id=43665
2020-03-26 12:09:36 -06:00
Sam Parker db8a3c4206 [NFC] Create X86 subdirectory for indvar tests
Many IndVarSiimplify tests target an x86 triple, so move them into
a target specific folder.
2020-03-26 12:24:45 +00:00
John McCall 9514c048d8 Use optimal layout and preserve alloca alignment in coroutine frames.
Previously, we would ignore alloca alignment when building the frame
and just use the natural alignment of the allocated type.  If an alloca
is over-aligned for its IR type, this could lead to a frame entry with
inadequate alignment for the downstream uses of the alloca.

Since highly-aligned fields also tend to produce poor layouts under a
naive layout algorithm, I've also switched coroutine frames to use the
new optimal struct layout algorithm.

In order to communicate the frame size and alignment to later passes,
I needed to set align+dereferenceable attributes on the frame-pointer
parameter of the resume function.  This is clearly the right thing to
do, but the align attribute currently seems to result in assumptions
being added during inlining that the optimizer cannot easily remove.
2020-03-26 00:51:09 -04:00
Florian Hahn 081efa7dd0 [SCCP] Add a few constantexpr,undef tests for cond propagation 2020-03-25 21:28:35 +00:00
Tyker f1a9efabcb Ignore/Drop droppable uses for code-sinking in InstCombine
Summary:
This patch allows code-sinking in InstCombine to be performed when instruction have uses in llvm.assume.

Use are considered droppable when it is preferable to modify the User such that the use disappears rather than to prevent a transformation because of the use.
for now uses are considered droppable if they are in an llvm.assume.

Reviewers: jdoerfert, nikic, spatel, lebedev.ri, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73832
2020-03-25 20:42:52 +01:00
Alexandre Ganea 934d4feab1 [ThinLTO] Don't rely on debug output for thinlto_samplepgo_icp3 test
Because using -print-imports is not thread-safe, make the test rely on llvm-dis instead.
Also cover the ICALL-PROM part as intended originally.

Differential Revision: https://reviews.llvm.org/D76775
2020-03-25 14:38:20 -04:00
Sanjay Patel f631b9dc36 [VectorCombine] add shuffle tests; NFC
Goes with DD76727.
2020-03-25 10:35:03 -04:00
sstefan1 72b51d6f93 OpenMP] Adding InaccessibleMemOnly and InaccessibleMemOrArgMemOnly for runtime calls.
Summary: Attempt to add more attributes for runtime calls.

Reviewers: jdoerfert

Subscribers: guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75010
2020-03-25 14:08:50 +00:00
Juneyoung Lee d82c1e8c56 Rename test name, add more tests for codegenprepare 2020-03-25 20:31:12 +09:00
Juneyoung Lee e951a48996 Add freeze(and x, const) case to codegenprepare's freeze-cmp.ll 2020-03-25 17:29:01 +09:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Sanjay Patel c84446f4e9 [VectorCombine] add tests for bitcast (shuffle); NFC 2020-03-24 15:18:32 -04:00
Juneyoung Lee 49f75132bc [DivRemPairs] Freeze operands if they can be undef values
Summary:
DivRemPairs is unsound with respect to undef values.

```
      // bb1:
      //   %rem = srem %x, %y
      // bb2:
      //   %div = sdiv %x, %y
      // -->
      // bb1:
      //   %div = sdiv %x, %y
      //   %mul = mul %div, %y
      //   %rem = sub %x, %mul
```

If X can be undef, X should be frozen first.
For example, let's assume that Y = 1 & X = undef:
```
   %div = sdiv undef, 1 // %div = undef
   %rem = srem undef, 1 // %rem = 0
 =>
   %div = sdiv undef, 1 // %div = undef
   %mul = mul %div, 1   // %mul = undef
   %rem = sub %x, %mul  // %rem = undef - undef = undef
```
http://volta.cs.utah.edu:8080/z/m7Xrx5

Same for Y. If X = 1 and Y = (undef | 1), %rem in src is either 1 or 0,
but %rem in tgt can be one of many integer values.

This resolves https://bugs.llvm.org/show_bug.cgi?id=42619 .

This miscompilation disappears if undef value is removed, but it may take a while.
DivRemPair happens pretty late during the optimization pipeline, so this optimization seemed as a good candidate to fix without major regression using freeze than other broken optimizations.

Reviewers: spatel, lebedev.ri, george.burgess.iv

Reviewed By: spatel

Subscribers: wuzish, regehr, nlopes, nemanjai, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76483
2020-03-25 03:46:14 +09:00
Sanjay Patel 88b493a838 [ValueTracking] improve undef/poison analysis for constant vectors
Differential Revision: https://reviews.llvm.org/D76702
2020-03-24 13:35:47 -04:00
Sanjay Patel 6c3c7a0dd6 [InstSimplify] add tests for freeze(constexpr); NFC 2020-03-24 11:39:19 -04:00
Sanjay Patel 58ec867a3b [InstSimplify] add more tests for freeze(constant); NFC
These should really be moved over to a ConstantFolding test file,
but since this may overlap with the in-progress D76010 and similar
tests already exist here, we can do that as a later cleanup.
2020-03-24 09:53:49 -04:00
Douglas Yung 18e1a59eed Fix another instance where a variable was renamed in the generated LLVM IR. [NFC] 2020-03-23 22:53:29 -07:00
Jun Ma a44de12ab2 [Coroutines] Also check lifetime intrinsic for local variable when build
coroutine frame

Currently we move all allocas into the frame when build coroutine frame in
CoroSplit pass. However, this can be relaxed.

Since CoroSplit pass run after Inline pass, we can use lifetime intrinsic to
do such analysis: If the scope of lifetime intrinsic is not across any suspend
point, rather than move the allocas to frame, we can just move them to entry bb
of corresponding function. This reduce the frame size.

More importantly, this also avoid data race in multithread environment.
Consider one inline function by coroutine: it starts a thread which access
local variables, while after inline the movement of allocs to frame also access
them. cause data race.

Differential Revision: https://reviews.llvm.org/D75664
2020-03-24 13:41:55 +08:00
Vedant Kumar b7cd291c15 [GlobalOpt] Treat null-check of loaded value as use of global (PR35760)
PR35760 shows an example program which, when compiled with `clang -O0`
or gcc at any optimization level, prints '0'. However, llvm transforms
the program in a way that causes it to print '1'.

Fix the issue by having `AllUsesOfValueWillTrapIfNull` return false when
analyzing a load from a global which is used by an `icmp`. This special
case was untested [0] so this is just deleting dead code.

An alternative fix might be to change the GlobalStatus analysis for the
global to report "Stored" instead of "StoredOnce". However, "StoredOnce"
is appropriate when only one value other than the initializer is stored
to the global.

[0]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/Transforms/IPO/GlobalOpt.cpp.html#L662

Differential Revision: https://reviews.llvm.org/D76645
2020-03-23 22:36:09 -07:00
Douglas Yung e79b1ab65b Make test more flexible for when the variable is renamed in the generated LLVM IR. [NFC] 2020-03-23 22:03:21 -07:00
Matt Arsenault 66073953a5 AMDGPU: Allow vectorization of round intrinsic
There seems to be a small benefit to the legalized sequence for v2f16
round with packed instructions, so allow vectorizing it by reducing
the cost.

An unintended side effect is vectorization of f32 round also
happens. The current FMA logic seems off to me, and isn't checking for
packed instructions.
2020-03-23 17:00:41 -04:00
Matt Arsenault b20a1d840f GVNSink: Allow handling addrspacecast 2020-03-23 16:50:58 -04:00
Matt Arsenault 43d98a0ecf Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Sanjay Patel a1fe6beb1e [InstCombine] remove one-use check for ctpop -> cttz
Two one-use checks were added with rGfdcb27105537,
but only the first one is necessary to limit an
increase in instruction count. The second transform
only creates one instruction, so it is always a
reasonable canonicalization/optimization.
2020-03-23 13:59:57 -04:00
Johannes Doerfert 9d38f98dc3 [OpenMPOpt] Validate declaration types against the expected types
Validation of the found runtime library functions declarations types
(return and argument types) with the expected types.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D76058
2020-03-23 11:43:36 -05:00
Johannes Doerfert 68fed27067 [Attributor] Handle calls in AAValueConstantRange properly
We did handle calls that were operands of certain instructions but not
standalone calls we visit via indirection, e.g., selects.
2020-03-23 10:45:24 -05:00
Johannes Doerfert 54ec9b54f6 [Attributor] Unify handling of must-tail calls
We special cased must-tail calls all over the place because they cannot
be modified as other calls can be. However, we already centralized the
modification API so we can centralize the handling as well. This
simplifies the code and allows to remove must-tail calls completely.
2020-03-23 10:45:24 -05:00
Simon Pilgrim fdcb271055 [InstCombine] Limit CTPOP -> CTTZ simplifications to one use
Tweak D76568 so we only combine if it will remove the bit-twiddling.

Suggested by @spatel
2020-03-23 14:33:41 +00:00
Florian Hahn 33942d18b1 [SCCP] Precommit additional range propagation test. 2020-03-23 14:15:19 +00:00
Sanjay Patel 5eeea337be [VectorCombine] add more tests for extract-extract patterns; NFC 2020-03-23 09:33:56 -04:00
Simon Pilgrim 16d2065cfc [InstCombine] Add ub-safe negation patterns (PR27817) 2020-03-23 12:47:32 +00:00
Florian Hahn b8a2cf6b5b [SCCP] Extend test coverage in conditions-ranges.ll to false branches. 2020-03-23 12:32:14 +00:00
Simon Pilgrim 72d1419bfb [InstCombine] Add CTPOP -> CTTZ simplifications (PR43513)
As detailed on PR43513, we can simplify:

ctpop(x | -x) -> bitwidth - cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/caw49X

ctpop(~x & (x - 1)) -> cttz(x, false)
Alive2: http://volta.cs.utah.edu:8080/z/5zfVrx

I've tweaked the initial test cases I added at rG2d712fb75584 to increase commutativity testing.

Differential Revision: https://reviews.llvm.org/D76568
2020-03-23 11:04:33 +00:00
Juneyoung Lee 5792c2236d Add test cases that are addressed by D76010 2020-03-23 13:49:29 +09:00
Florian Hahn 006244152d [SCCP] Add a few more tests for conditional propagation,XOR. 2020-03-22 21:43:33 +00:00
Craig Topper f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Nikita Popov dc81923659 [InstCombine] Remove ExpensiveCombines option
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)

Differential Revision: https://reviews.llvm.org/D76540
2020-03-22 16:56:28 +01:00
Simon Pilgrim 2d712fb755 [InstCombine] Add ctpop -> cttz combine tests (PR43513) 2020-03-21 19:30:22 +00:00
Huihui Zhang 4f5af9d70d [ValueTracking] Fix usage of DataLayout::getTypeStoreSize()
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.

For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().

For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().

Reviewers: sdesmalen, efriedma, apazos, reames

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76454
2020-03-20 16:52:15 -07:00
Huihui Zhang 1993f95f2b [ValueTracking][SVE] Fix getOffsetFromIndex for scalable vector.
Summary:
Return None if GEP index type is scalable vector. Size of scalable vectors
are multiplied by a runtime constant.

Avoid transforming:
  %a = bitcast i8* %p to <vscale x 16 x i8>*
  %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
  store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp0
  %tmp1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 1
  store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp1

into:
  %a = bitcast i8* %p to <vscale x 16 x i8>*
  %tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
  %1 = bitcast <vscale x 16 x i8>* %tmp0 to i8*
  call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 false)

Reviewers: sdesmalen, efriedma, apazos, reames

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, arphaman, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76464
2020-03-20 14:48:29 -07:00
Nikita Popov 2b52e4e629 [InstCombine] Remove known bits constant folding
If ExpensiveCombines is enabled (which is the case with -O3 on the
legacy PM and always on the new PM), InstCombine tries to compute
the known bits of all instructions in the hope that all bits end up
being known, which is fairly expensive.

How effective is it? If we add some statistics on how often the
constant folding succeeds and how many KnownBits calculations are
performed and run test-suite we get:

    "instcombine.NumConstPropKnownBits": 642,
    "instcombine.NumConstPropKnownBitsComputed": 18744965,

In other words, we get one fold for every 30000 KnownBits calculations.
However, the truth is actually much worse: Currently, known bits are
computed before performing other folds, so there is a high chance
that cases that get folded by known bits would also have been
handled by other folds.

What happens if we compute known bits after all other folds
(hacky implementation: https://gist.github.com/nikic/751f25b3b9d9e0860db5dde934f70f46)?

    "instcombine.NumConstPropKnownBits": 0,
    "instcombine.NumConstPropKnownBitsComputed": 18105547,

So it turns out despite doing 18 million known bits calculations,
the known bits fold does not do anything useful on test-suite.
I was originally planning to move this into AggressiveInstCombine
so it only runs once in the pipeline, but seeing this, I think
we're better off removing it entirely.

As this is the only use of the "expensive combines" mechanism,
it may be removed afterwards, but I'll leave that to a separate patch.

Differential Revision: https://reviews.llvm.org/D75801
2020-03-20 20:54:06 +01:00
Nikita Popov 3205d1a860 [InstCombine] Handle known shl nsw sign bit in SimplifyDemanded
Ideally SimplifyDemanded should compute the same known bits as
computeKnownBits(). This patch addresses one discrepancy, where
ValueTracking is more powerful: If we have a shl nsw shift, we
know that the sign bit of the input and output must be the same.
If this results in a conflict, the result is poison.

This is implemented in
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L1175-L1179)
and
2c4ca6832f/lib/Analysis/ValueTracking.cpp (L904-L908).

This implements the same basic logic in SimplifyDemanded. It's
slightly stronger, because I return undef instead of zero for the
poison case (which is not an option inside ValueTracking).

As mentioned in https://reviews.llvm.org/D75801#inline-698484,
we could detect poison in more cases, this just establishes parity
with the existing logic.

Differential Revision: https://reviews.llvm.org/D76489
2020-03-20 18:16:05 +01:00
Simon Pilgrim 34659de5fd [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391)
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.

This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
2020-03-20 15:48:06 +00:00
Florian Hahn ece6cf0fa5 [DSE,MSSA] Precommit additional tests for D73763. 2020-03-20 13:39:46 +00:00
Simon Pilgrim 7f764fa18f [ValueTracking] Add some initial isKnownNonZero DemandedElts support (PR36319) 2020-03-20 13:29:00 +00:00
Nikita Popov ce6c95aaca [InstCombine] Move test to instcombine; NFC
This test uses -instcombine, so move it into the appropriate
directory. Also fork it for expensive checks enabled/disabled.
2020-03-20 12:41:19 +01:00
Simon Pilgrim c1efdbcbe0 [ValueTracking] Add computeKnownBits DemandedElts support to shift instructions (PR36319) 2020-03-20 11:08:08 +00:00
Nikita Popov a09ff56b5b [Tests] Regenerate some test checks; NFC 2020-03-20 12:06:53 +01:00
Nikita Popov 0372768776 [InstCombine] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.

This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-20 10:23:39 +01:00
Nikita Popov 5c10967157 [InstCombine] Don't replace musttail result based on known bits
This is the same change as D75824, but for two cases where
InstCombine performs the same optimization: Replacing an instruction
whose bits are fully known with a constant. This is not (generally)
legal for musttail calls.

Differential Revision: https://reviews.llvm.org/D76457
2020-03-20 10:17:09 +01:00
Florian Hahn 3a8372ed02 [DSE] Support traversing MemoryPhis.
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.

To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.

This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration.  For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72148
2020-03-20 07:51:42 +00:00
Jun Ma 032251e34d [Coroutines] Fix PR45130
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.

Differential Revision: https://reviews.llvm.org/D76345
2020-03-20 11:27:08 +08:00
Simon Pilgrim 95b6f62efb [InstSimplify] Add some vector shift tests to show lack of DemandedElts support 2020-03-19 22:09:51 +00:00
Kazu Hirata e23d786526 [JumpThreading] Fix infinite loop (PR44611)
Summary:
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44611 by
preventing an infinite loop in the jump threading pass when
-jump-threading-across-loop-headers is on.  Specifically, without this
patch, jump threading through two basic blocks would trigger on the
same area of the CFG over and over, resulting in an infinite loop.

Consider testcase PR44611-across-header-hang.ll in this patch.  The
first opportunity to thread through two basic blocks is:

  from bb_body2 through bb_header and bb_body1 to bb_body2.

The pass duplicates bb_header and bb_body1 as, say, bb_header.thread1
and bb_body1.thread1.  Since bb_header contains a successor edge back
to itself, bb_header.thread1 also contains a successor edge to
bb_header, immediately giving rise to the next jump threading
opportunity:

  from bb_header.thread1 through bb_header and bb_body1 to bb_body2.

After that, we repeatedly thread an incoming edge into bb_header
through bb_header and bb_body1 to bb_body2.  In other words, we keep
peeling one iteration from bb_header's self loop.

The patch fixes the problem by preventing the pass from duplicating a
basic block containing a self loop.

Reviewers: wmi, junparser, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76390
2020-03-19 12:49:36 -07:00
Simon Pilgrim c2586cab89 [InstCombine][X86] Tests for variable but in-range vector-by-scalar shift amounts (PR40391)
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
2020-03-19 19:24:55 +00:00
Simon Pilgrim 433897da4a [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by immediate amounts to generic shifts (PR40391)
The slli/srli/srai 'immediate' vector shifts (although its not immediate anymore to match gcc) can be replaced with generic shifts if the shift amount is known to be in range.
2020-03-19 15:44:24 +00:00
Simon Pilgrim fb11455038 [InstCombine][X86] Tests for variable but in-range vector-by-scalar shift amounts (PR40391)
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
2020-03-19 13:11:06 +00:00
Florian Hahn 4a58996dd2 [SCCP] Use constant ranges for PHI nodes.
For PHIs with multiple incoming values, we can improve precision by
using constant ranges for integers. We can over-approximate phis
by merging the incoming values.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71933
2020-03-19 12:45:33 +00:00
Simon Pilgrim 0b458d4dca [ValueTracking] Add computeKnownBits DemandedElts support to ADD/SUB/MUL instructions (PR36319) 2020-03-19 12:41:29 +00:00
Simon Pilgrim 7ce7f78963 [InstSimplify] Add missing vector ADD+SUB tests to show lack of DemandedElts support 2020-03-19 11:27:27 +00:00
Simon Pilgrim d259e31a17 [InstSimplify] Add missing vector MUL tests to show lack of DemandedElts support 2020-03-19 11:27:27 +00:00
Florian Hahn 8a36594a7e [SCCP] Use constant ranges for binary operators.
If one of the operands of a binary operator is a constant range, we can
use ConstantRange::binaryOp to approximate the result.

We still handle single element constant ranges as we did previously,
with ConstantExpr::get(), because ConstantRange::binaryOp still gives
worse results in a few cases for single element ranges.

Also note that we bail out early if any of the operands is still unknown.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71936
2020-03-19 09:35:48 +00:00
Chen Zheng d8fcdcdf68 [Reassociate] add testcases for more than 1 pairs - NFC 2020-03-19 05:21:24 -04:00
Chen Zheng 3f85134d71 [PowerPC] implement target hook isProfitableToHoist
On Powerpc fma is faster than fadd + fmul for some types,
(PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target
hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma
pattern by hoisting fmul to predecessor block.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D76207
2020-03-19 00:17:25 -04:00
Huihui Zhang 2ea5495759 [InstCombine][SVE] Fix InstCombiner::visitAllocaInst for scalable vector.
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where scalable
property doesn't matter (check for zero-sized alloca), we should explicitly
call getKnownMinSize() to avoid implicit type conversion to uint64_t, which is
invalid for scalable vector type.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76386
2020-03-18 20:57:14 -07:00
Simon Pilgrim 99336bf95a [ValueTracking] Add computeKnownBits DemandedElts support to masked add instructions (PR36319) 2020-03-18 21:50:56 +00:00
Simon Pilgrim 49bdfd888d [InstSimplify] Add missing vector masked add tests to show lack of DemandedElts support 2020-03-18 21:04:54 +00:00
Sanjay Patel 22c66c1a28 [JumpThreading] add a miscompile test based on discussion in D76332; NFC 2020-03-18 16:46:18 -04:00
Simon Pilgrim 9d40292a64 [ValueTracking] Add computeKnownBits DemandedElts support to XOR instructions (PR36319) 2020-03-18 20:24:14 +00:00
Simon Pilgrim 47ce1406c8 [InstSimplify] Add missing vector OR test to show lack of DemandedElts support 2020-03-18 20:24:14 +00:00
Simon Pilgrim 6bdb0efa42 [InstSimplify] Regenerate OR tests 2020-03-18 20:24:13 +00:00
Simon Pilgrim 1010c44b4c [ValueTracking] Add computeKnownBits DemandedElts support to EXTRACTELEMENT/OR/BSWAP/BITREVERSE instructions (PR36319)
These are all covered by the bswap/bitreverse vector tests.
2020-03-18 18:49:58 +00:00
Simon Pilgrim 9c6458ecf8 [InstSimplify] Add bitreverse/bswap vector tests
Shows missing DemandedElts support (PR36319)
2020-03-18 18:17:10 +00:00
Sam Parker fc2a5ef9c8 [NFC][PowerPC] Update test
Run the update script on one of the loop unroll tests.
2020-03-18 16:21:37 +00:00
Simon Pilgrim 06150e8356 [ValueTracking] Add computeKnownBits DemandedElts support to AND instructions (PR36319) 2020-03-18 15:38:15 +00:00
Sander de Smalen ef64ba8311 [InstCombine] GEPOperator::accumulateConstantOffset does not support scalable vectors
Avoid transforming:

 %0 = bitcast i8* %base to <vscale x 16 x i8>*
 %1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %0, i64 1

into:

 %0 = getelementptr i8, i8* %base, i64 16
 %1 = bitcast i8* %0 to <vscale x 16 x i8>*

Reviewers: efriedma, ctetreau

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76236
2020-03-18 14:58:46 +00:00
Simon Pilgrim 24c2e61362 [InstCombine][X86] Add additional demandedelts style test for in-range variable per-element shift amounts (PR40391)
If we've shuffled the shift amount some of the (undemanded) elements may have become undef - this should be handled by the missing support in PR36319.
2020-03-18 14:36:34 +00:00
Florian Hahn 0db7244295 [SCCP] Precommit some additional tests for integer ranges. 2020-03-18 11:34:04 +00:00
Simon Pilgrim f4e495a18e [InstCombine][X86] simplifyX86varShift - convert variable in-range per-element shift amounts to generic shifts (PR40391)
AVX2/AVX512 per-element shifts can be replaced with generic shifts if the shift amounts are guaranteed to be in-range (upper bits are known zero).
2020-03-18 11:26:54 +00:00
Simon Pilgrim cda2b0769f [InstCombine][X86] Tests for variable but in-range per-element shift amounts (PR40391)
These shifts are masked to be inrange so we should be able to replace them with generic shifts.
2020-03-18 10:29:47 +00:00
Florian Hahn 5672ae8d86 [SCCP] Use constant ranges for select, if cond is overdefined.
For selects with an unknown condition, we can approximate the result by
merging the state of both options. This automatically takes care of
the case where on operand is undef.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71935
2020-03-18 09:26:02 +00:00
Sanjay Patel be9e3d9416 [InstCombine] reduce demand-limited bool math to logic, part 2
Follow-on suggested in:
D75961
2020-03-17 15:18:18 -04:00
Sanjay Patel 586565c514 [InstCombine] add tests for bool math; NFC 2020-03-17 15:18:18 -04:00
Huihui Zhang 1bf0c99375 [ValueTracking][SVE] Fix isGEPKnownNonNull for scalable vector.
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where the
scalable property doesn't matter, we should explicitly call getKnownMinSize()
to avoid implicit type conversion to uint64_t, which is not valid for scalable
vector type.

Reviewers: sdesmalen, efriedma, apazos, reames

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76260
2020-03-17 11:31:30 -07:00
Tyker e8ac825f5b [AssumeBundles] Detection of Empty bundles
Summary: Prevent InstCombine from removing llvm.assume for which the arguement is true when they have operand bundles with usefull information.

Reviewers: jdoerfert, nikic, lebedev.ri

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76147
2020-03-17 15:50:15 +01:00
Florian Hahn 1d6f919df2 [SCCP] Explicitly mark values as overdefined (NFC).
This was part of D60582 but can be committed separately.
2020-03-17 12:13:30 +00:00
Serguei Katkov 80c351cdb6 [InstCombine] Transform to undef incorrect atomic unordered mem intrinsics
According to LangRef:
If len is not a positive integer multiple of element_size, then the behaviour of the intrinsic is undefined.

Add InstCombine rule to transform intrinsic to undef operation.

This is a follow-up for D76116.

Reviewers: reames
Reviewed By: reames
Subscribers: hiraditya, jfb, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D76215
2020-03-17 10:20:16 +07:00
Chen Zheng fa72b29bec [PowerPC] add test cases for target hook isProfitableToHoist - NFC 2020-03-16 23:07:30 -04:00
Nico Weber 623cb95eb3 Revert "[InstSimplify] Simplify calls with "returned" attribute"
This reverts commit 45555c3819.
Causes clang crashes in some causes, see comments on
https://reviews.llvm.org/D75815 for details (including
repro steps).
2020-03-16 15:21:30 -04:00
Huihui Zhang 0616e9964b [InstSimplify][SVE] Fix SimplifyGEPInst for scalable vector.
Summary:
Skip folds that rely on DataLayout::getTypeAllocSize(). For scalable
vector, only minimal type alloc size is known at compile-time.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75892
2020-03-16 11:46:12 -07:00
Juneyoung Lee 7aecf2323c [ExpandMemCmp] Correctly set alignment of generated loads
Summary:
This is a part of the series of efforts for correcting alignment of memory operations.
(Another related bugs: https://bugs.llvm.org/show_bug.cgi?id=44388 , https://bugs.llvm.org/show_bug.cgi?id=44543 )

This fixes https://bugs.llvm.org/show_bug.cgi?id=43880 by giving default alignment of loads to 1.

The test CodeGen/AArch64/bcmp-inline-small.ll should have been changed; it was introduced by https://reviews.llvm.org/D64805 . I talked with @evandro, and confirmed that the test is okay to be changed.
Other two tests from PowerPC needed changes as well, but fixes were straightforward.

Reviewers: courbet

Reviewed By: courbet

Subscribers: nlopes, gchatelet, wuzish, nemanjai, kristof.beyls, hiraditya, steven.zhang, danielkiss, llvm-commits, evandro

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76113
2020-03-16 22:39:48 +09:00
Juneyoung Lee acdcd23b7b Add tests to ExpandMemCmp/X86/memcmp.ll before submitting D76113 2020-03-16 22:19:37 +09:00
Juneyoung Lee 6ad63606ea [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-16 12:46:20 +09:00
Juneyoung Lee 4ffe3ac729 Revert "[CodeGenPrepare] Freeze condition when transforming select to br"
This reverts commit 10aa7ea951.
2020-03-16 12:45:54 +09:00
Florian Hahn 650f363bd7 [ValueLattice] Add singlecrfromundef lattice value.
This patch adds a new singlecrfromundef lattice value, indicating a
single element constant range which was merge with undef at some point.
Merging it with another constant range results in overdefined, as we
won't be able to replace all users with a single value.

This patch uses a ConstantRange instead of a Constant*, because regular
integer constants are represented as single element constant ranges as
well and this allows the existing code working without additional
changes.

Reviewers: efriedma, nikic, reames, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75845
2020-03-15 11:23:46 +00:00
Juneyoung Lee 10aa7ea951 [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-15 11:10:46 +09:00
Florian Hahn 4878aa36d4 [ValueLattice] Add new state for undef constants.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.

The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces  the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.

Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.

Reviewers: efriedma, reames, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75120
2020-03-14 17:19:59 +00:00
Akira Hatanaka c6f1713c46 [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

This reapplies the patch in https://reviews.llvm.org/rG1f5b471b8bf4,
which was reverted because it was causing crashes.

https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2

Check that HasSafePathToCall is true before checking the call is a tail
call.

Original commit message:

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-13 13:52:14 -07:00
Alexey Zhikhartsev f71abec661 [LoopInterchange] Fix interchanging contents of preheader BBs
Summary:
Previously LCSSA was getting broken by placing instructions into the
(newly) inner *header* instead of the *pre*header.

Fixes PR43474

Reviewers: fhahn

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75943
2020-03-13 15:59:37 -04:00
Reid Kleckner 478b06e687 Revert "[ObjC][ARC] Check the basic block size before calling DominatorTree::dominate"
This reverts commit 5c3117b0a9

This should not be necessary after
7593a480db, and Florian Hahn has confirmed
that the problem no longer reproduces with this patch.

I happened to notice this code because the FIXME talks about
OrderedBasicBlock.

Reviewed By: fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D76075
2020-03-13 11:57:55 -07:00
Sanjay Patel 89b19e8959 [SimplifyCFG] add test for chain of empty block conditional branches; NFC 2020-03-13 14:39:31 -04:00
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Sanjay Patel afc4dcee83 [SimplifyCFG] regenerate complete test checks; NFC 2020-03-13 14:12:28 -04:00
Sanjay Patel 7fe0e70ecc [SimplifyCFG] regenerate test checks; NFC 2020-03-13 14:12:28 -04:00
Florian Hahn e30c257811 [CVP,SCCP] Precommit test for D75055.
Test case for PR44949.
2020-03-13 17:53:39 +00:00
Florian Hahn 0c5b6e2ea5 Recommit "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This patch should fix the cause of the stage2 failures and
PR45185.

This reverts the revert commit c52f839e72.
2020-03-13 17:03:22 +00:00
Simon Pilgrim a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Tyker 2543567c41 [AssumeBundles] filter usefull attriutes to preserve
Summary:
This patch will filter attributes to only preserve those that are usefull.
In the case of NoAlias it is filtered out not because it isn't usefull
but because it is incorrect to preserve it as it is only valdi for the
duration of the function.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75828
2020-03-13 17:35:47 +01:00
Tyker 69375fd0a3 [AssumeBundles] Preserve Information in the inliner
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75825
2020-03-13 17:35:47 +01:00
omarahmed1111 b285b333dc [Attributor] Detect possibly unbounded cycles in functions
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.

Reviewed By: jdoerfert, uenoku, baziotis

Differential Revision: https://reviews.llvm.org/D74691
2020-03-13 11:17:33 -05:00
Pankaj Gode bf990530ae [Attributor] Improve noalias preservation using reachability
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
             check only uses possibly executed before this callsite.

Propagates caller argument's noalias attribute to callee.

Reviewed by: jdoerfert, uenoku

Reviewers: jdoerfert, sstefan1, uenoku

Subscribers: uenoku, sstefan1, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D71617
2020-03-13 21:09:08 +05:30
Nico Weber 86eb2c3991 Revert "[ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't"
This reverts commit 1f5b471b8b.
Causes asserts when building code with arc. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2
for a full repro. Will post a creduced repro once creduce is done
running.
2020-03-13 10:16:02 -04:00
Juneyoung Lee c39cb1c0dd [CodeGenPrepare] Expand freeze conversion to support fcmp and icmp with null
Summary:
This is a simple patch that expands https://reviews.llvm.org/D75859 to pointer comparison and fcmp

Checked with Alive2

Reviewers: reames, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76048
2020-03-13 17:21:33 +09:00
Juneyoung Lee 48b901b0e1 Add tests to Transforms/CodeGenPrepare/X86/freeze-cmp.ll before commiting D76048 2020-03-13 17:18:42 +09:00
Johannes Doerfert a198adb490 [Attributor] IPO across definition boundary of a function marked alwaysinline
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D75590
2020-03-13 01:06:12 -05:00
Johannes Doerfert 40815a4957 Revert "[Attributor] Enable test with update check lines"
This reverts commit 13def55b3f.

This broke a buildbot, will investigate.
2020-03-13 00:59:47 -05:00
Johannes Doerfert 1c9c23d60e [OpenMP][Opt][NFC] Add test case for known runtime function attributes
This test somehow did not make it in before.
2020-03-13 00:28:14 -05:00
Johannes Doerfert 13def55b3f [Attributor] Enable test with update check lines
The test disabled in 528a6a1d4c is enabled
again with the check lines for 9708279c72.
2020-03-12 23:24:15 -05:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Florian Hahn c52f839e72 Revert "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This commit is likely causing clang-with-lto-ubuntu to fail
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/16052

Also causes PR45185.

This reverts commit f1ac5d2263.
2020-03-12 18:49:11 +00:00
Sanjay Patel a66dc755db [InstSimplify] simplify FP ops harder with FMF (part 2)
This is part of the IR sibling for:
D75576

Related transform committed with:
rG8ec71585719d
2020-03-12 09:53:20 -04:00
Sanjay Patel 8ec7158571 [InstSimplify] simplify FP ops harder with FMF
This is part of the IR sibling for:
D75576

(I'm splitting part of the transform as a separate commit
to reduce risk. I don't know of any bugs that might be
exposed by this improved folding, but it's hard to see
those in advance...)
2020-03-12 09:13:28 -04:00
Sanjay Patel d748e759d5 [InstSimplify] add tests for FP poison; NFC
Adapted from codegen tests seen in D75576.
2020-03-12 08:32:04 -04:00
Florian Hahn f1ac5d2263 [SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.

This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used

* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60582
2020-03-12 12:03:06 +00:00
Max Kazantsev 3dc6e53c97 [LoopPeel] Turn incorrect assert into a check
Summary:
This patch replaces incorrectt assert with a check. Previously it asserts that
if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove
`isKnownPredicate(A == B)`.

Both these fact may be not provable. It is shown in the provided test:

Could not prove: `{-294,+,-2}<%bb1> !=  0`
Asserting: `{-294,+,-2}<%bb1> == 0`

Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot
also prove that it is not zero.

Instead of assert, we should be checking the required conditions explicitly.

Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev
Reviewed By: lebedev.ri
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76050
2020-03-12 17:23:07 +07:00
Juneyoung Lee 629cf3c1c5 Apply update_test_check.py to CodeGenPrepare/X86/freeze-icmp.ll test 2020-03-12 16:37:16 +09:00
Huihui Zhang 8f52573962 [InstSimplify][SVE] Fix SimplifyInsert/ExtractElementInst for scalable vector.
Summary:
For scalable vector, index out-of-bound can not be determined at compile-time.
The same apply for VectorUtil findScalarElement().

Add test cases to check the functionality of SimplifyInsert/ExtractElementInst for scalable vector.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: cameron.mcinally, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75782
2020-03-11 15:09:56 -07:00
Sanjay Patel fae900921b [InstCombine] reduce demand-limited bool math to logic
The cmp math test is inspired by memcmp() patterns seen in D75840.
I know there's at least 1 related fold we can do here if both
values are sext'd, but I'm not seeing a way to generalize further.

We have some other bool math patterns that we want to reduce, but
that might require fixing the bogus transforms noted in D72396.

Alive proof translations of the regression tests:
https://rise4fun.com/Alive/zGWi

  Name: demand add 1
  %xz = zext i1 %x to i32
  %ys = sext i1 %y to i32
  %sub = add i32 %xz, %ys
  %r = lshr i32 %sub, 31
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = zext i1 %and to i32

  Name: demand add 2
  %xz = zext i1 %x to i5
  %ys = sext i1 %y to i5
  %sub = add i5 %xz, %ys
  %r = and i5 %sub, 16
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = select i1 %and, i5 -16, i5 0

  Name: demand add 3
  %xz = zext i1 %x to i8
  %ys = sext i1 %y to i8
  %a = add i8 %ys, %xz
  %r = ashr i8 %a, 7
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = sext i1 %and to i8

  Name: cmp math
  %gt = icmp ugt i32 %x, %y
  %lt = icmp ult i32 %x, %y
  %xz = zext i1 %gt to i32
  %yz = zext i1 %lt to i32
  %s = sub i32 %xz, %yz
  %r = lshr i32 %s, 31
  =>
  %r = zext i1 %lt to i32

Differential Revision: https://reviews.llvm.org/D75961
2020-03-11 15:45:58 -04:00
Sanjay Patel fa8c4c7ffa [InstCombine] add tests for bool math; NFC 2020-03-11 15:45:58 -04:00
Juneyoung Lee 8eb2f865c3 [CodeGenPrepare] Fold br(freeze(icmp x, const)) to br(icmp(freeze x, const))
Summary:
This patch helps CodeGenPrepare move freeze into the icmp when it is used by branch.
It reenables generation of efficient conditional jumps.

This is only done when at least one of icmp's operands is constant to prevent the transformation from increasing # of freeze instructions.

Performance degradation of MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test is resolved with this patch.

Checked with Alive2

Reviewers: reames, fhahn, nlopes

Reviewed By: reames

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75859
2020-03-12 03:16:15 +09:00
Florian Hahn bc6c8c4bbb [Matrix] Add remark propagation along the inlined-at chain.
This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.

To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.

With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.

    load.h:
    template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
      Matrix<Ty, R, C> Result;
      Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
      return Result;
    }

    store.h:
    template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
       *reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
    }

    toplevel.cpp
    void test(double *A, double *B, double *C) {
      store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
    }

For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram in the inlined-at chain. Note that the list of instructions
for a subprogram includes the instructions from its own subprograms
recursively. For example using the example above, for the subprogram
'test' this includes inline functions 'load' and 'store'. This allows
surfacing the remarks at a level useful to users.

Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.

Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D73600
2020-03-11 17:40:08 +00:00
Fangrui Song a0c0389ffb [SimplifyLibcalls] Don't replace locked IO (fgetc/fgets/fputc/fputs/fread/fwrite) with unlocked IO (*_unlocked)
This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO.

C11 7.21.5.2 The fflush function

> If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.

i.e. fopen'ed FILE* is inherently captured.

POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking

> These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE *) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions.

After a thread fopen'ed a FILE*, when it is calling foobar() which is now replaced by foobar_unlocked(),
if another thread is concurrently calling fflush(0), the behavior is undefined.

C11 7.22.4.4 The exit function

> Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed.

The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called.
See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html
for how the replacement makes libc interceptors difficult to implement.

dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers
without synchronization.  f->wpos or rpos could get outside of the buffer, thread A could do
f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently.

This can produce exploitable conditions depending on libc internals.

Revert the SimplifyLibcalls part change because the cons obviously
overweigh the pros.  Even when the replacement is feasible, the benefit
is indemonstrable, more so in an application instead of an artificial
glibc benchmark.  Theoretically the replacement could be beneficial when
calling getc_unlocked/putc_unlocked in a loop, but then it is better
using a blocked IO operation and the user is likely aware of that.

The function attribute inference is still useful and thus kept.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D75933
2020-03-10 11:11:58 -07:00
Tyker a4cde9ad7b Fixed [AssumeBundles] Move to IR so it can be used by Analysis
This is a recommit of 57c964aaa7
after fixing modules build.
2020-03-10 18:02:39 +01:00
Simon Moll d871ef4e6a [instcombine] remove fsub to fneg hacks; only emit fneg
Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp
negation. This also extends the scalarization cost in instcombine for unary
operators to result in the same IR rewrites for fneg as for the idiom.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D75467
2020-03-10 16:57:02 +01:00
Florian Hahn c8c14d979a [InstCombine] Support vectors in SimplifyAddWithRemainder.
SimplifyAddWithRemainder currently also matches for vector types, but
tries to create an integer constant, which causes a crash.

By using Constant::getIntegerValue() we can support both the scalar and
vector cases.

The 2 added test cases crash without the fix.

Reviewers: spatel, lebedev.ri

Reviewed By: spatel, lebedev.ri

Differential Revision: https://reviews.llvm.org/D75906
2020-03-10 14:29:40 +00:00
Jonas Paulsson c2dafe12dc [SimplifyCFG] Skip merging return blocks if it would break a CallBr.
SimplifyCFG should not merge empty return blocks and leave a CallBr behind
with a duplicated destination since the verifier will then trigger an
assert. This patch checks for this case and avoids the transformation.

CodeGenPrepare has a similar check which also has a FIXME comment about why
this is needed. It seems perhaps better if these two passes would eventually
instead update the CallBr instruction instead of just checking and avoiding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45062.

Review: Craig Topper

Differential Revision: https://reviews.llvm.org/D75620
2020-03-10 14:59:13 +01:00
Sanjay Patel 6e60e1025f [InstCombine] regenerate test checks; NFC
tmp -> t because 'tmp' tends to cause problems for the auto-generation script.
2020-03-10 09:57:41 -04:00
Sanjay Patel 467eec0910 [InstCombine] fold gep-of-select-of-constants (PR45084)
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=45084
...we failed to combine a gep with constant indexes with a
pointer operand that is a select of constants.

Differential Revision: https://reviews.llvm.org/D75807
2020-03-10 09:25:13 -04:00
Sanjay Patel 5b465ad290 [InstCombine] add/adjust tests for select-gep; NFC
Goes with D75807
2020-03-10 09:25:13 -04:00
Florian Hahn 2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
Simon Pilgrim 5cbddf7cbc [X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen
Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll
2020-03-10 11:59:40 +00:00
Simon Pilgrim 9b05596eff [SLPVectorizer][X86] Add fmaxnum/fminnum tests 2020-03-10 11:18:28 +00:00
Florian Hahn b53907bfed [SLP] Precommit vector library test for D75878. 2020-03-10 10:17:34 +00:00
ahatanak 1f5b471b8b [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-09 13:21:38 -07:00