Commit Graph

18401 Commits

Author SHA1 Message Date
Florian Hahn bdada7546e
[VPlan] Adjust assert in splitBlock to allow splitting at end.
SplitAt should only be dereferenced in the assert if it does not point
to the end of the block. This fixes a crash in the added test case.
2021-05-13 13:36:35 +01:00
Florian Hahn 860b37526a [Passes] Run GlobalsAA before LICM during LTO in new PM.
This patch adjusts the LTO pipeline in the new PM to run GlobalsAA
before LICM to match the legacy PM.

This fixes a regression where the new PM failed to vectorize loops that
require hoisting/sinking by LICM depending on GlobalsAA info.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102345
2021-05-13 13:11:18 +01:00
Jingu Kang 107d19eb01 Revert "[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch"
This reverts commit 88b259c014.

It needs to fix below bugs.

https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302
2021-05-13 08:40:49 +01:00
Chuanqi Xu c1359ef07e [Coroutines] Salvege Debug.values
Summary: The previous implementation of coro-split didn't collect values
used by dbg instructions into the spills which made a log debug info
unavailable with optimization on.
This patch tries to collect these uses which are used by dbg.values. In
this way, the debugbility of coroutine could be as powerful as normal
functions with optimization on.

To avoid enlarging the coroutine frame, this patch only collects
`dbg.value` whose value is already in the coroutine frame. This decision
may make some debug info getting unavailable. But if we are with
optimization on, the performance issue should be considered first. And
this patch would make the debugbility of coroutine to be better only
without changing the layout of the frame.

Test-plan: check-llvm

Reviewed By: aprantl, lxfind

Differential Revision: https://reviews.llvm.org/D97673
2021-05-13 13:06:33 +08:00
Chuanqi Xu 6e5b8f489a [Coroutines] Enable printing coroutine frame when dbg info is available
Summary: This patch tries to build debug info for coroutine frame in the
middle end. Although the coroutine frame is constructed and maintained by
the compiler and the programmer shouldn't care about the coroutine frame
by the design of C++20 coroutine,
a lot of programmers told me that they want to see the layout of the
coroutine frame strongly. Although C++ is designed as an abstract layer
so that the programmers shouldn't care about the actual memory in bits,
many experienced C++ programmers  are familiar with assembler and
debugger to see the memory layout in fact, After I was been told they
want to see the coroutine frame about 3 times, I think it is an actual
and desired demand.

However, the debug information is constructed in the front end and
coroutine frame is constructed in the middle end. This is a natural and
clear gap. So I could only try to construct the debug information in the
middle end after coroutine frame constructed. It is unusual, but we are
in consensus that the approch is the best one.

One hard part is we need construct the name for variables since there
isn't a map from llvm variables to DIVar. Then here is the strategy this
patch uses:
- The name `__resume_fn `, `__destroy_fn` and `__coro_index ` are
  constructed by the patch.
- Then the name `__promise` comes from the dbg.variable of corresponding
  dbg.declare of PromiseAlloca, which shows highest priority to
construct the debug information for the member of coroutine frame.
- Then if the member is struct, we would try to get the name of the llvm
  struct directly. Then replace ':' and '.' with '_' to make it
printable for debugger.
- If the member is a basic type like integer or double, we would try to
  emit the corresponding name.
- Then if the member is a Pointer Type, we would add `Ptr` after
  corresponding pointee type.
- Otherwise, we would name it with 'UnknownType'.

Reviewered by: lxfind, aprantl, rjmcall, dblaikie

Differential Revision: https://reviews.llvm.org/D99179
2021-05-13 12:43:08 +08:00
Anton Afanasyev ab2c499d3a [SLP] Add insertelement instructions to vectorizable tree
Add new type of tree node for `InsertElementInst` chain forming vector.
These instructions could be either removed, or replaced by shuffles during
vectorization and we can add this node to cost model, so naturally estimating
their cost, getting rid of `CompensateCost` tricks and reducing further work
for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this
patch is the first step towards revectorization of partially vectorization
(to fix PR42022 completely). After adding inserts to tree the next step is
to add vector instructions there (for instance, to merge `store <2 x float>`
and `store <2 x float>` to `store <4 x float>`).

Fixes PR40522 and PR35732.

Differential Revision: https://reviews.llvm.org/D98714
2021-05-13 07:41:45 +03:00
Anton Afanasyev cd9090031c [SLP][Test] Fix and precommit tests for D98714 2021-05-13 07:41:45 +03:00
Anton Afanasyev 00a0595b25 [SLP][Test] Fix and precommit tests for D98714 2021-05-13 07:41:06 +03:00
Justin Bogner e7d26aceca Change the context instruction for computeKnownBits in LoadStoreVectorizer pass
This change enables cases for which the index value for the first
load/store instruction in a pair could be a function argument. This
allows using llvm.assume to provide known bits information in such
cases.

Patch by Viacheslav Nikolaev. Thanks!

Differential Revision: https://reviews.llvm.org/D101680
2021-05-12 15:29:29 -07:00
Nikita Popov a8f7dee1df [InstCombine] Support one-hot merge for logical and/or
If a logical and/or is used, we need to be careful not to propagate
a potential poison value from the RHS by inserting a freeze
instruction. Otherwise it works the same way as bitwise and/or.

This is intended to address the regression reported at
https://reviews.llvm.org/D101191#2751002.

Differential Revision: https://reviews.llvm.org/D102279
2021-05-12 21:01:18 +02:00
Florian Hahn ed9e1a7dcc
[PhaseOrdering] Add test for missing vectorization with NewPM. 2021-05-12 19:34:14 +01:00
Stelios Ioannou 1124ad2f5d [LoopFlatten] Simplify loops so that the pass can operate on unsimplified loops.
The loop flattening pass requires loops to be in simplified form. If the
loops are not in simplified form, the pass cannot operate. This patch
simplifies all loops before flattening. As a result, all loops will be
simplified regardless of whether anything ends up being flattened.

This change was inspired by observing a certain loop that was not flatten
because the loops were not in simplified form. This loop is added as a
test to verify that it is now flattened.

Differential Revision: https://reviews.llvm.org/D102249

Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30
2021-05-12 19:22:01 +01:00
David Sherwood 61630814b1 [NFC] Use variable GEP index in vec_demanded_elts tests
I've changed a test in each of these files:

  Transforms/InstCombine/vec_demanded_elts.ll
  Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll

to use a variable GEP index instead of a constant value so that
we're testing the more general case.
2021-05-12 14:56:04 +01:00
Roman Lebedev 554b1bced3
[InstCombine] ~(C + X) --> ~C - X (PR50308)
We can not rely on (C+X)-->(X+C) already happening,
because we might not have visited that `add` yet.
The added testcase would get stuck in an endless combine loop.
2021-05-12 16:10:55 +03:00
David Sherwood b7a11274f9 [LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors
In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test `pointer_iv_mixed` from the file:

  Transforms/LoopVectorize/consecutive-ptr-uniforms.ll

and added it here for scalable vectors instead:

  Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Differential Revision: https://reviews.llvm.org/D101294
2021-05-12 11:02:11 +01:00
Qiu Chaofan 6d2df18163 [VectorComine] Restrict single-element-store index to inbounds constant
Vector single element update optimization is landed in 2db4979. But the
scope needs restriction. This patch restricts the index to inbounds and
vector must be fixed sized. In future, we may use value tracking to
relax constant restrictions.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D102146
2021-05-12 13:18:20 +08:00
Congzhe Cao 3f8be15f29 [LoopInterchange] Handle lcssa PHIs with multiple predecessors
This is a bugfix in the transformation phase.

If the original outer loop header branches to both the inner loop
(header) and the outer loop latch, and if there is an lcssa PHI
node outside the loop nest, then after interchange the new outer latch
will have an lcssa PHI node inserted which has two predecessors, i.e.,
the original outer header and the original outer latch. Currently
the transformation assumes it has only one predecessor (the original
outer latch) and crashes, since the inserted lcssa PHI node does
not take both predecessors as incoming BBs.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D100792
2021-05-11 21:30:54 -04:00
Jordan Rupprecht fec2945998 Revert "[GVN] Clobber partially aliased loads."
This reverts commit 6c57044231.

It causes assertion errors due to widening atomic loads, and potentially causes miscompile elsewhere too. Repro, also posted to D95543:

```
$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
```
2021-05-11 16:08:53 -07:00
Congzhe Cao 40e3aa39bd [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions
of the loop body will be different before and after interchange,
resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 18:36:53 -04:00
Congzhe Cao d3f89d4d16 Revert "[LoopInterchange] Fix legality for triangular loops"
This reverts commit 29342291d2.

The test case requires an assert build. Will add REQUIRES and re-commit.
2021-05-11 18:10:58 -04:00
Fangrui Song 129f466e22 [GlobalOpt] Remove heap SROA
GlobalOpt implements a heap SROA (SROA for an malloc allocatated struct or array
of structs) which is largely undertested (heap-sra-[1234].ll are basically the
same test with very little difference) and does not trigger at all when
bootstrapping clang (it only supports the case of one single store).

The heap SROA implementation causes PR50027 (GEP is not properly handled; crash or miscompile).
Just drop the implementation. I have deleted some obviously duplicated tests
but kept `heap-sra-[12]{,-no-nullopt}.ll`.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102257
2021-05-11 11:34:37 -07:00
Eli Friedman 61cbbba7a6 [ArgumentPromotion] Fix byval alignment handling.
Make sure the alignment of the generated operations matches the
alignment of the byval argument.  Previously, we were just ignoring
alignment and getting lucky.

While I'm here, also delete the unnecessary "tail" handling.
Passing a pointer to a byval argument to a "tail" call is UB, so
rewriting to an alloca doesn't require any special handling.

Differential Revision: https://reviews.llvm.org/D89819
2021-05-11 11:22:18 -07:00
Congzhe Cao 29342291d2 [LoopInterchange] Fix legality for triangular loops
This is a bug fix in legality check.

When we encounter triangular loops such as the following form:
    for (int i = 0; i < m; i++)
      for (int j = 0; j < i; j++), or

    for (int i = 0; i < m; i++)
      for (int j = 0; j*i < n; j++),

we should not perform interchange since the number of executions of the loop body
will be different before and after interchange, resulting in incorrect results.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D101305
2021-05-11 11:00:46 -04:00
Florian Hahn faebc6bf10
[VPlan] Register recipe for instr if the simplified value is recipe.
If the simplified VPValue is a recipe, we need to register it for Instr,
in case it needs to be recorded. The way this is handled in general may
change soon, following some post-commit comments.

This fixes PR50298.
2021-05-11 14:32:34 +01:00
Sanjay Patel 49950cb1f6 [SLP] restrict matching of load combine candidates
The test example from https://llvm.org/PR50256 (and reduced here)
shows that we can match a load combine candidate even when there
are no "or" instructions. We can avoid that by confirming that we
do see an "or". This doesn't apply when matching an or-reduction
because that match begins from the operands of the reduction.

Differential Revision: https://reviews.llvm.org/D102074
2021-05-11 08:46:40 -04:00
Stanislav Mekhanoshin 22d295f695 [AMDGPU] Constant fold Intrinsic::amdgcn_perm
Differential Revision: https://reviews.llvm.org/D102203
2021-05-10 16:23:11 -07:00
Sanjay Patel 5577e86691 [InstCombine] fold extract subvector of bitcast insertelt
This is visible in the original example from:
https://llvm.org/PR50055
(but this change doesn't solve the bug)

https://alive2.llvm.org/ce/z/vM_Yq-
2021-05-10 17:20:10 -04:00
Sanjay Patel 8a74cc139d [InstCombine] add tests for extract-subvector of insert; NFC 2021-05-10 17:03:28 -04:00
Nikita Popov 463ea28e96 [InstCombine] Fold comparison of integers by parts
Let's say you represent (i32, i32) as an i64 from which the parts
are extracted with lshr/trunc. Then, if you compare two tuples by
parts you get something like A[0] == B[0] && A[1] == B[1], just
that the part extraction happens by lshr/trunc and not a narrow
load or similar.

The fold implemented here reduces such equality comparisons by
converting them into a comparison on a larger part of the integer
(which might be the whole integer). It handles both the "and of eq"
and the conjugated "or of ne" case.

I'm being conservative with one-use for now, though this could be
relaxed if profitable (the base pattern converts 11 instructions
into 5 instructions, but there's quite a few variations on how it
can play out).

Differential Revision: https://reviews.llvm.org/D101232
2021-05-10 22:22:39 +02:00
Florian Hahn 93a9a8a8d9
[VecLib] Add support for vector fns from Darwin's libsystem.
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.

This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D101856
2021-05-10 21:19:58 +01:00
Nikita Popov aa9b02ac75 [Inliner] Fix noalias metadata handling for instructions simplified during cloning (PR50270)
Instead of using VMap, which may include instructions from the
caller as a result of simplification, iterate over the
(FirstNewBlock, Caller->end()) range, which will only include new
instructions.

Fixes https://bugs.llvm.org/show_bug.cgi?id=50270.

Differential Revision: https://reviews.llvm.org/D102110
2021-05-10 21:59:59 +02:00
Andy Kaylor 7086025d65 [Dependence Analysis] Enable delinearization of fixed sized arrays
Patch by Artem Radzikhovskyy!

Allow delinearization of fixed sized arrays if we can prove that the GEP indices do not overflow the array dimensions. The checks applied are similar to the ones that are used for delinearization of parametric size arrays. Make sure that the GEP indices are non-negative and that they are smaller than the range of that dimension.

Changes Summary:

- Updated the LIT tests with more exact values, as we are able to delinearize and apply more exact tests
- profitability.ll - now able to delinearize in all cases, no need to use -da-disable-delinearization-checks flag and run the test twice
- loop-interchange-optimization-remarks.ll - in one of the cases we are able to delinearize without using -da-disable-delinearization-checks
- SimpleSIVNoValidityCheckFixedSize.ll - removed unnecessary "-da-disable-delinearization-checks" flag. Now can get the exact answer without it.
- SimpleSIVNoValidityCheckFixedSize.ll and PreliminaryNoValidityCheckFixedSize.ll - made negative tests more explicit, in order to demonstrate the need for "-da-disable-delinearization-checks" flag

Differential Revision: https://reviews.llvm.org/D101486
2021-05-10 10:30:15 -07:00
Alexey Bataev 30463bc3f1 [SLP]Do not count perfect diamond matches for gathers several times.
Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023
2021-05-10 07:08:07 -07:00
Fraser Cormack 3212a08a8c [Constant] Allow ConstantAggregateZero a scalable element count
A ConstantAggregateZero may be created from a scalable vector type.
However, it still assumed fixed number of elements when queried for
them. This patch changes ConstantAggregateZero to correctly report its
element count.

This change fixes a couple of issues. Firstly, it fixes a crash in
Constant::getUniqueValue when called on a scalable-vector
zeroinitializer constant.

Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which
translating a scalable-vector zeroinitializer would hit the assertion in
ConstantAggregateZero::getNumElements when casting to a FixedVectorType,
rather than reporting an error more gracefully. This is currently
hypothetical as the IRTranslator has deeper issues preventing the use of
scalable vector types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102082
2021-05-10 13:51:53 +01:00
Teresa Johnson 220f6e5271 [SimplifyCFG] Ignore ephemeral values when counting insts for threading
Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).

Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.

Differential Revision: https://reviews.llvm.org/D101494
2021-05-09 19:06:54 -07:00
Nikita Popov 7549399d0e [SROA] Regenerate test checks (NFC) 2021-05-09 18:20:52 +02:00
Roman Lebedev 4aec8f4ce0
[NFC][LoopIdiom] Add some tests for 'lshr until zero' ('count active bits') "on steroids" idiom 2021-05-09 01:07:07 +03:00
Florian Hahn 2bf34c0a93
[VPlan] Add test for sink scalars and merging using VPlan.
Add a couple of tests with scalars that can be sunk to their predicated
users.

This pre-commits tests for D100258.
2021-05-08 16:47:48 +01:00
Roman Lebedev 1acd9a1a29
Revert "[LICM] Hoist loads with invariant.group metadata"
This appears to miscompile google benchmark's GetCacheSizesFromKVFS()
when compiling with -fstrict-vtable-pointers.
Runnable reproducer: https://godbolt.org/z/f9ovKqTzb
The "f.fail()" crashes with BUS error, it is compiled into testb,
and the adress it is testing is non-sensical.

This reverts commit 4c89bcadf6.
2021-05-08 15:44:49 +03:00
Qiu Chaofan 2db4979c0f [VectorCombine] Simplify to scalar store if only one element updated
This patch simplifies load-insertelt-store pattern into
getelementptr-store.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D98240
2021-05-08 18:14:51 +08:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Arthur Eubanks 6f7131002b [NewPM] Move analysis invalidation/clearing logging to instrumentation
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102093
2021-05-07 15:25:31 -07:00
Florian Hahn 6c99e63120 [SCEV] By more careful when traversing phis in isImpliedViaMerge.
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of

Consider the case in exit_cond_depends_on_inner_loop.

At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).

The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.

Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.

I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).

So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.

Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101829
2021-05-07 19:52:29 +01:00
Sanjay Patel 0a6f11aabd [AArch64] add test for missed vectorization; NFC
This is a reduction of the example in:
https://llvm.org/PR50256
2021-05-07 10:45:11 -04:00
Simon Pilgrim 2a3f60b5f5 [SLP] Regenerate tests to reduce diff in D98714. NFCI. 2021-05-07 12:33:00 +01:00
Caroline Concatto cf06c8eee3 [LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction
The function fixReduction used to assert/crash for scalable vector when
a vector reduce could be done with a smaller vector.
This patch removes this assertion as it is safe to use scalable vector for
vector reduce and truncate.

Differential Revision: https://reviews.llvm.org/D101260
2021-05-07 09:37:37 +01:00
Peilin Guo 911a541620 [LazyValueInfo] Insert an Overdefined placeholder to prevent infinite recursion
getValueFromCondition() uses a Visited set to record the intermediate value.
However, it uses a postorder way to compute the value first and update the
Visited set later. Thus it will be trapped into an infinite recursion if there
exists IRs that use no dominated by its def as in this example:

  %tmp3 = or i1 undef, %tmp4
  %tmp4 = or i1 undef, %tmp3

To prevent this, we can insert an Overdefined placeholder into the set
before computing the actual value.

Reviewed by: nikic

Differential Revision: https://reviews.llvm.org/D101273
2021-05-07 16:05:50 +08:00
David Green 4979c90458 [LV] Account for tripcount when calculation vectorization profitability
The loop vectorizer will currently assume a large trip count when
calculating which of several vectorization factors are more profitable.
That is often not a terrible assumption to make as small trip count
loops will usually have been fully unrolled. There are cases however
where we will try to vectorize them, and especially when folding the
tail by masking can incorrectly choose to vectorize loops that are not
beneficial, due to the folded tail rounding the iteration count up for
the vectorized loop.

The motivating example here has a trip count of 5, so either performs 5
scalar iterations or 2 vector iterations (with VF=4). At a high enough
trip count the vectorization becomes profitable, but the rounding up to
2 vector iterations vs only 5 scalar makes it unprofitable.

This adds an alternative cost calculation when we know the max trip
count and are folding tail by masking, rounding the iteration count up
to the correct number for the vector width. We still do not account for
anything like setup cost or the mixture of vector and scalar loops, but
this is at least an improvement in a few cases that we have had
reported.

Differential Revision: https://reviews.llvm.org/D101726
2021-05-06 12:36:46 +01:00
Kerry McLaughlin 8c9742bd23 [SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences
Adds support for scalable vectorization of loops containing first-order recurrences, e.g:
```
for(int i = 0; i < n; i++)
  b[i] =  a[i] + a[i - 1]
```
This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into
account when inserting into and extracting from the last lane of a vector.
CreateVectorSplice has been added to construct a vector for the recurrence, which
returns a splice intrinsic for scalable types. For fixed-width the behaviour
remains unchanged as CreateVectorSplice will return a shufflevector instead.

The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D101076
2021-05-06 11:35:39 +01:00
Juneyoung Lee 8a156d1c27 [InstCombine] Fully disable select to and/or i1 folding
This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101191
2021-05-06 09:29:52 +09:00