Commit Graph

28237 Commits

Author SHA1 Message Date
David Sherwood 3fd96e1b2e [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-06 10:13:15 +01:00
Chuanqi Xu 0fd03feb4b [FuncSpec] Return changed if function is changed by tryToReplaceWithConstant
The may get changed before specialization by RunSCCPSolver. In other
words, the pass may change the function without specialization happens.
Add test and comment to reveal this.
And it may return No Changed if the function get changed by
RunSCCPSolver before the specialization. It looks like a potential bug.

Test Plan: check-all

Reviewed By: https://reviews.llvm.org/D107622

Differential Revision: https://reviews.llvm.org/D107622
2021-08-06 17:00:17 +08:00
David Sherwood 43a5c750d1 Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors"
This reverts commit 95800da914.
2021-08-06 09:48:16 +01:00
Chuanqi Xu 62fc3e0ad6 [NFC] [FuncSpec] Remove unused variables in isArgumentInteresting 2021-08-06 16:38:20 +08:00
Chuanqi Xu cc3f40bb41 [FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish)
Noticed that the computation for function specialization cost of a
function wouldn't change during the traversal of the arguments for the
function. We could hoist the computation out of the traversal. I
observed about ~1% improvement on compile time for spec2017. But I guess
it may not be precise. This should be NFC and fine.

Reviewed By: Sjoerd Meijer

Differential Revision: https://reviews.llvm.org/D107621
2021-08-06 15:43:05 +08:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Florian Hahn 3e58dd19df
[LV] Move reduction PHI node fixup to VPlan::execute (NFC).
All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100113
2021-08-06 08:29:20 +01:00
Chuanqi Xu 82ca845b47 [NFC] [FuncSpec] Update the Todo list for recursive functions
Now the recursive functions may get specialized many times when
`func-specialization-max-iters` increases. See discussion in
https://reviews.llvm.org/D106426 for details.
2021-08-06 14:43:17 +08:00
Arthur Eubanks a1b21ed3fb [GCov] Emit memset instead of stores in __llvm_gcov_reset
For a very large module, __llvm_gcov_reset can become very large.
__llvm_gcov_reset previously emitted stores to a bunch of globals in one
huge basic block. MemCpyOpt would turn many of these stores into
memsets, and updating MemorySSA would be extremely slow.

Verified that this makes the compile time of certain files go down
drastically (20min -> 5min).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107538
2021-08-05 22:40:15 -07:00
Chris Jackson 113a06f7a5 {DebugInfo][LSR] Don't cache dbg.value that are already undef
The SCEV-based salvaging method caches dbg.value information pre-LSR so
that salvaging may be attempted post-LSR. If the dbg.value are already
undef pre-LSR then a salvage attempt would be fruitless, so avoid
caching them.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D107448
2021-08-05 19:16:43 +01:00
Kazu Hirata 72661f337a [Transforms] Drop unnecessary const from return types (NFC)
Identified with readability-const-return-type.
2021-08-05 08:53:17 -07:00
Alexey Bataev e7c3eaa8ae [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107494
2021-08-05 08:41:24 -07:00
David Sherwood e9177b0958 Fix build issues caused by 95800da914 2021-08-05 16:26:34 +01:00
Sander de Smalen 3e47f009ff [LV] Consider ExtractValue as uniform.
Since all operands to ExtractValue must be loop-invariant when we deem
the loop vectorizable, we can consider ExtractValue to be uniform.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D107286
2021-08-05 16:20:50 +01:00
eopXD fd7f6a3c81 [NFC][LoopIdiom] rename boolean variable NegStride to IsNegStride
Rename variable for better code readability.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107570
2021-08-05 23:11:42 +08:00
Momchil Velikov f171149e0d [SimpifyCFG] Speculate a store preceded by a local non-escaping load
In SimplifyCFG we may simplify the CFG by speculatively executing
certain stores, when they are preceded by a store to the same
location.  This patch allows such speculation also when the stores are
similarly preceded by a load.

In order for this transformation to be correct we need to ensure that
the memory location is writable and the store in the new location does
not introduce a data race.

Local objects (created by an `alloca` instruction) are always
writable, so once we are past a read from a location it is valid to
also write to that same location.

Seeing just a load does not guarantee absence of a data race (unlike
if we see a store) - the load may still be part of a race, just not
causing undefined behaviour
(cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic).

In the original program, a data race might have been prevented by the
condition, but once we move the store outside the condition, we must
be sure a data race wasn't possible anyway, no matter what the
condition evaluates to.

One way to be sure that a local object is never concurrently
read/written is check that its address never escapes the function.

Hence this transformation is restricted to local, non-escaping
objects.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107281
2021-08-05 15:54:42 +01:00
Florian Hahn 38b098be66
[VectorCombine] Limit scalarization known non-poison indices.
We can only trust the range of the index if it is guaranteed
non-poison.

Fixes PR50949.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107364
2021-08-05 15:36:31 +01:00
Dawid Jurczak 06206a8cd1 [BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc
Additionally with this patch aligned DSE which is the only user of emitCalloc.

Differential Revision: https://reviews.llvm.org/D103523
2021-08-05 16:18:38 +02:00
David Sherwood 95800da914 [LoopVectorize] Add support for replication of more intrinsics with scalable vectors
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

  1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
  are always uniform by definition.
  2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
  loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

  1. If the 'assume' intrinsic's operand is not loop invariant, then we
  are free to treat this as uniform anyway since it's only a performance
  hint. We will get the benefit for the first lane.
  2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
  variant then for scalable vectors we assume these still ultimately come
  from the broadcast of an alloca. We do not support scalable vectorisation
  of loops containing alloca instructions, hence the alloca itself would
  be invariant. If the pointer does not come from an alloca then the
  intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

  Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

  Transforms/LoopVectorize/scalable-assume.ll
  Transforms/LoopVectorize/scalable-lifetime.ll
  Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Differential Revision: https://reviews.llvm.org/D107284
2021-08-05 15:17:27 +01:00
Dawid Jurczak f8cdde7195 [SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation
FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009
such transformation is already performed in DSE.

Differential Revision: https://reviews.llvm.org/D103451
2021-08-05 16:08:32 +02:00
Sander de Smalen 8d08a84745 [LV] Remove a change that was added in D106164.
This change wasn't strictly necessary for D106164 and could be removed.
This patch addresses the post-commit comments from @fhahn on D106164, and
also changes sve-widen-gep.ll to use the same IR test as shown in
pointer-induction.ll.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D106878
2021-08-05 14:44:53 +01:00
eopXD 26aa1bbe97 [NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.

This is a preceeding of D107353.
The big picture is to let LoopIdiom deal with runtime-determined sizes.

Reviewed By: Whitney, lebedev.ri

Differential Revision: https://reviews.llvm.org/D104595
2021-08-05 13:21:48 +08:00
Nikita Popov bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis 29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Alexey Bataev 214f99b27c Revert "[SLP]Do not emit extra shuffle for insertelements vectorization."
This reverts commit 871ea69803 to fix the
problem if the first vector is not just undef.
2021-08-04 11:28:59 -07:00
Dawid Jurczak 238139be09 [DSE][NFC] Clean up DeadStoreElimination from unused variables
Differential Revision: https://reviews.llvm.org/D106446
2021-08-04 19:44:40 +02:00
Petr Hosek 6660cec568 [InstrProfiling] Emit bias variable eagerly
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.

Differential Revision: https://reviews.llvm.org/D107377
2021-08-04 10:17:08 -07:00
Sander de Smalen fe6ae81ef3 [InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded.
According to the LangRef, a (vscale_range) value of 0 means unbounded.

This patch additionally cleans up the test file vscale_sext_and_zext.ll.
2021-08-04 17:17:37 +01:00
Chris Jackson 21ee38e24f [DebugInfo][LSR] Avoid crashes on large integer inputs
SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may
contain very large integers but the translation does not support
integers greater than 64 bits. This patch adds checks to ensure
conversions of these large integers is not attempted. A regression test
is added to ensure no such translation is attempted.

Reviewed by: StephenTozer

PR: https://bugs.llvm.org/show_bug.cgi?id=51329

Differential Revision: https://reviews.llvm.org/D107438
2021-08-04 15:51:22 +01:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Sjoerd Meijer 30fbb06979 [FuncSpec] Support specialising recursive functions
This adds support for specialising recursive functions. For example:

    int Global = 1;
    void recursiveFunc(int *arg) {
      if (*arg < 4) {
        print(*arg);
        recursiveFunc(*arg + 1);
      }
    }
    void main() {
      recursiveFunc(&Global);
    }

After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:

    void main() {
      print(1);
      print(2);
      print(3);
    }

To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
  function specialisation, which is necessary for the next iteration to
  recognise the constant values and trigger.

Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.

Differential Revision: https://reviews.llvm.org/D106426
2021-08-04 08:07:04 +01:00
Shimin Cui 2d9759c790 [GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc
Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107397
2021-08-03 19:22:53 -04:00
Craig Topper b818da27ab [SimplifyCFG] Enable switch to lookup table for more types.
This transform has been restricted to legal types since
https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660
in 2012.

This is particularly restrictive on RISCV64 which only has i64
as a legal integer type. i32 is a very common type in code
generated from C, but we won't form a lookup table with it.
This also effects other common types like i8/i16 types on ARM,
AArch64, RISCV, etc.

This patch proposes to allow power of 2 types larger than 8 bit, if
they will fit in the largest legal integer type in DataLayout.
These types are common in C code so generally well handled in
the backends.

We could probably do this for other types like i24 and rely on
alignment and padding to allow the backend to use a single wider
load. This isn't my main concern right now and it will need more
tests.

We could also allow larger types up to some limit and let the
backend split into multiple loads, but we need to define that
limit. It's also not my main concern right now.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107233
2021-08-03 15:35:16 -07:00
Alexey Bataev 871ea69803 [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107344
2021-08-03 13:18:41 -07:00
Alexey Bataev 7d9d926a18 Revert "[SLP]Improve graph reordering."
This reverts commit e408d1dfab and
2 other (4b25c11321 and
c2deb2afaf) related to fix the problem with the
reordering shuffles.
2021-08-03 12:13:43 -07:00
Sami Tolvanen 7ce1c4da77 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8ce with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-08-03 11:35:30 -07:00
Dylan Fleming 3943a74666 [InstCombine] Fixed select + masked load fold failure
Fixed type assertion failure caused by trying to fold a masked load with a
select where the select condition is a scalar value

Reviewed By: sdesmalen, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107372
2021-08-03 19:06:12 +01:00
Philip Reames 223835f08b [runtimeunroll] A bit of style cleanup to simplify a following change [NFC]
Use for-range, use the idiomatic pattern for non-loop values, etc..
2021-08-03 10:28:46 -07:00
Krishna 946fd4ea65 Revert "[InstCombine] Remove nnan requirement for transformation to fabs from select"
This reverts commit 6180ce2e2a.
2021-08-03 18:08:11 +05:30
Krishna d99260641b [InstCombine] Fold phi ( inttoptr/ptrtoint x ) to phi (x)
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support to specific cases where this optimization works.

In this patch, we focus on phi-node operands with inttoptr casts.
We know that ptrtoint( inttoptr( ptrtoint x) ) is same as ptrtoint (x).
So, we want to remove this roundtrip cast which goes through phi-node.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D106289
2021-08-03 17:52:59 +05:30
Krishna 6180ce2e2a [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Differential Revision: https://reviews.llvm.org/D106872
2021-08-03 17:52:58 +05:30
David Sherwood 0156f91f3b [NFC] Rename enable-strict-reductions to force-ordered-reductions
I'm renaming the flag because a future patch will add a new
enableOrderedReductions() TTI interface and so the meaning of this
flag will change to be one of forcing the target to enable/disable
them. Also, since other places in LoopVectorize.cpp use the word
'Ordered' instead of 'strict' I changed the flag to match.

Differential Revision: https://reviews.llvm.org/D107264
2021-08-03 09:33:01 +01:00
Shimin Cui 7ce98cf56e [GlobalOpt] Fix the assert for stored once non-pointer to global address
This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107302
2021-08-02 19:23:29 -04:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Roman Lebedev 4ba3326f17
[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
This allows the expansion logic to actually trigger if the argument
was extended from i1 element type, like the rest of the reductions expect.

Alive2 agrees:
https://alive2.llvm.org/ce/z/wcfews (or zext)
https://alive2.llvm.org/ce/z/FCXNFx (or sext)
https://alive2.llvm.org/ce/z/f26zUY (and zext)
https://alive2.llvm.org/ce/z/jprViN (and sext)
2021-08-03 00:54:35 +03:00
Roman Lebedev 554fc9ad0a
[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/3oqir9 (self)
https://alive2.llvm.org/ce/z/6cuI5m (zext)
https://alive2.llvm.org/ce/z/4FL8rD (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Roman Lebedev f47b7b6d10
[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/noXtZ8 (self)
https://alive2.llvm.org/ce/z/JNrN6C (zext)
https://alive2.llvm.org/ce/z/58snuN (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Nikita Popov c7770574f9 Revert "[unroll] Move multiple exit costing into consumer pass [NFC]"
This reverts commit 76940577e4.

This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.
2021-08-02 22:23:34 +02:00
Roman Lebedev b9b7162b8b
[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/NbBaeT (self)
https://alive2.llvm.org/ce/z/iEaig4 (zext)
https://alive2.llvm.org/ce/z/meGb3y (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:23 +03:00
Roman Lebedev 0c13798056
[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/XxUScW (self)
https://alive2.llvm.org/ce/z/3usTF- (zext)
https://alive2.llvm.org/ce/z/GVxwQz (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:22 +03:00
Philip Reames 76940577e4 [unroll] Move multiple exit costing into consumer pass [NFC]
This aligns the multiple exit costing with all the other cost decisions.  Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.
2021-08-02 12:46:23 -07:00
Nikita Popov 380b8a603c [DFAJumpThreading] Use SmallPtrSet for Visited (NFC)
This set is only used for contains checks, so there is no need to
use std::set.
2021-08-02 21:30:25 +02:00
Nikita Popov 3f7aea1a37 [DFAJumpThreading] Use insert return value (NFC)
Rather than find + insert. Also use range based for loop.
2021-08-02 21:21:21 +02:00
Nikita Popov 84602f98c6 [DFAJumpThreading] Remove unnecessary includes (NFC)
This file uses neither unordered_map nor unordered_set.
2021-08-02 21:13:30 +02:00
Nikita Popov e97524cba2 [DFAJumpThreading] Mark DT as preserved in LegacyPM
It is marked as preserved in NewPM, but not LegacyPM.
2021-08-02 21:13:30 +02:00
Roman Lebedev 469793efa7
[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/PDansB (self)
https://alive2.llvm.org/ce/z/55D-Xc (zext)
https://alive2.llvm.org/ce/z/LxG3-r (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 21:57:51 +03:00
Philip Reames 9016beaa24 [unrollruntime] Pull out a helper function for readability and eventual reuse [nfc] 2021-08-02 11:47:27 -07:00
Philip Reames ebc4c4e3b0 [unroll] Add clarifying comment
The option to not preserve LCSSA is in fact not tested at all in upstream.  I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.
2021-08-02 10:44:56 -07:00
Florian Hahn bb725c9803
[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe.
This patch updates VPInterleaveRecipe::print to print the actual defined
VPValues for load groups and the store VPValue operands for store
groups.

The IR references may become outdated while transforming the VPlan and
the defined and stored VPValues always are up-to-date.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D107223
2021-08-02 18:36:36 +01:00
Roman Lebedev 1e801439be
[InstCombine] `xor` reduction w/ i1 elt type is a parity check
For i1 element type, `xor` and `add` are interchangeable
(https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like
an `add` reduction and consistently transform them both:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)

Though, let's emit the IR that is similar to the one we produce for
`vector_reduce_add(<n x i1>)`.

See https://bugs.llvm.org/show_bug.cgi?id=51259
2021-08-02 20:21:37 +03:00
Florian Mayer 66b4aafa2e [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-08-02 11:34:12 +01:00
Rosie Sumpter f117ed542f [LoopFlatten] Fix missed LoopFlatten opportunity
When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-08-02 11:09:54 +01:00
Shimin Cui 732b05555c [GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc
I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106589
2021-07-31 18:42:02 -04:00
Sanjay Patel f2a322bfcf [SROA] prevent crash on large memset length (PR50910)
I don't know much about this pass, but we need a stronger
check on the memset length arg to avoid an assert. The
current code was added with D59000.
The test is reduced from:
https://llvm.org/PR50910

Differential Revision: https://reviews.llvm.org/D106462
2021-07-31 14:07:30 -04:00
Sanjay Patel a22c99c3c1 [InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant
We can invert a compare constant and preserve the logic
as shown in this sampling:
https://alive2.llvm.org/ce/z/YAXbfs
(In theory, we could deal with non-all-ones/zero as well,
but it doesn't seem worthwhile.)

I noticed this as a part of the x86 codegen difference in
https://llvm.org/PR51259 - it ends up using "test"
instead of "not + cmp" in that example.

This pattern also shows up in https://llvm.org/PR41312
and https://llvm.org/PR50798 .

Differential Revision: https://reviews.llvm.org/D107170
2021-07-31 13:31:12 -04:00
Florian Mayer b5b023638a Revert "[hwasan] Detect use after scope within function."
This reverts commit 84705ed913.
2021-07-30 22:32:04 +01:00
Brendon Cahoon c4c379d633 [LoopStrengthReduction] Fix pointer extend asserts
Additional asserts were added to ScalarEvolution to enforce
pointer/int type rules. An assert is triggered when the LSR pass
attempts to extend a pointer SCEV in GenerateTruncates.

This patch changes GenerateTruncates to exit early if the Formaula
contains a ScaledReg or BaseReg with a pointer type.

Differential Revision: https://reviews.llvm.org/D107185
2021-07-30 17:24:08 -04:00
Fangrui Song a1532ed275 [InstrProfiling] Make CountersPtr in __profd_ relative
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.

```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```

(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059  # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

Differential Revision: https://reviews.llvm.org/D104556
2021-07-30 11:52:18 -07:00
Simon Pilgrim afc6b09dee [InstCombine] getMaskedTypeForICmpPair - remove dead code. NFCI.
Ok should be true at this point, so the early-out is dead - replace with an assert.
2021-07-30 19:23:05 +01:00
Alexey Bataev 95e5d401ae [SLP]Improve splats vectorization.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104
2021-07-30 10:17:45 -07:00
Kazu Hirata e76ddfa9ef [Transforms] Remove HasValueForBlock (NFC)
The function seems to be unused for at least one year.
2021-07-30 08:56:49 -07:00
Dylan Fleming a7a39ec886 [SVE] Add folds for sign and zero extends of vscale
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105994
2021-07-30 16:02:50 +01:00
Florian Mayer 84705ed913 [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-07-30 13:59:36 +01:00
Alexey Bataev 4b25c11321 [SLP]Fix an assertion for the size of user nodes.
For the nodes with reused scalars the user may be not only of the size
of the final shuffle but also of the size of the scalars themselves,
need to check for this. It is safe to just modify the check here, since
the order of the scalars themselves is preserved, only indeces of the
reused scalars are changed. So, the users with the same size as the
number of scalars in the node, will not be affected, they still will get
the operands in the required order.

Reported by @mstorsjo in D105020.

Differential Revision: https://reviews.llvm.org/D107080
2021-07-30 05:46:44 -07:00
Alexey Bataev f4fb854811 [SLP]Do not consider deleted instruction as external users.
If the instruction was previously deleted, it should not be treated as
an external user. This fixes cost estimation and removes dead
extractelement instructions.

Differential Revision: https://reviews.llvm.org/D107106
2021-07-30 05:37:43 -07:00
Alexey Bataev c2deb2afaf [SLP]Fix a crash in gathered loads analysis.
Need to check that the minimum acceptable vector factor is at least 2,
not 0, to avoid compiler crash during gathered loads analysis.

Differential Revision: https://reviews.llvm.org/D107058
2021-07-30 05:19:17 -07:00
Joseph Huber cd0dd8ece8 [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802
2021-07-29 19:28:31 -04:00
Andy Kaylor b4d945bacd Fixing an infinite loop problem in InstCombine
Patch by Mohammad Fawaz

This issues started happening after
b373b5990d
Basically, if the memcpy is volatile, the collectUsers() function should
return false, just like we do for volatile loads.

Differential Revision: https://reviews.llvm.org/D106950
2021-07-29 12:57:17 -07:00
Dawid Jurczak 5c315bee8c [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-29 18:34:10 +02:00
Rosie Sumpter fab5659c79 Revert "[LoopFlatten] Fix missed LoopFlatten opportunity"
This reverts commit 2df8bf9339.

Reverting because it causes an assertion failure.
2021-07-29 15:52:45 +01:00
Jeremy Morse 2537120c87 Follow-up to D105207, only salvage affine SCEVs to avoid a crash
SCEVToIterCountExpr only expects to be fed affine expressions, but
DbgRewriteSalvageableDVIs is feeding it non-affine induction variables.
Following this up with an obvious fix, will add test coverage too if
this avoids D105207 being reverted.
2021-07-29 11:48:08 +01:00
Rosie Sumpter 2df8bf9339 [LoopFlatten] Fix missed LoopFlatten opportunity
When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-07-29 09:47:41 +01:00
Joseph Huber adbaa39dfc [Attributor] Change function internalization to not replace uses in internalized callers
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931
2021-07-28 18:57:28 -04:00
Chris Jackson 0ba8595287 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
Reapply commit d675b594f4 that was
reverted due to buildbot failures. A simple fix has been applied to
remove an assertion.

Differential Revision: https://reviews.llvm.org/D105207
2021-07-28 23:04:59 +01:00
Sjoerd Meijer bc43078fe8 [LoopFlatten] Fix bug where SCEVCouldNotCompute object is used
The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute
object if the backedge-taken count is unpredictable. This fix ensures
there is no longer an attempt to use such an object to find the trip
count.

Patch by: Rosie Sumpter.

Differential Revision: https://reviews.llvm.org/D106970
2021-07-28 18:35:08 +01:00
Jeroen Dobbelaere 03b8c69d06 [PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types.
This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned  In D91661#2875179 by @jroelofs.

(The first attempt was in D105983)

D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls.
This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the
generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead
of looking at the available users of a prototype.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D106147
2021-07-28 19:30:29 +02:00
Jeroen Dobbelaere dc5570d149 Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types."
This reverts commit 77080a1eb6.

This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the
needed function cleanup. A next patch will then just improve on the name mangling.
2021-07-28 19:30:29 +02:00
Fangrui Song 6da3d8b19c [llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]]
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.

Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
2021-07-28 09:31:14 -07:00
Chris Jackson 3992896043 Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Reverted due to buildbot failures.
This reverts commit d675b594f4.
2021-07-28 16:44:54 +01:00
Chris Jackson d675b594f4 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
Reapply commit 796b84d26f that was
reverted due to reports of crashes. A minor change now guards against
getVariableLocationOperand() returning a nullptr.

Differential Revision: https://reviews.llvm.org/D106659
2021-07-28 16:28:46 +01:00
Sanjay Patel 5b83261c15 [DivRemPairs] make sure we have a valid CFG for hoisting division
This transform was added with e38b7e8948
and as shown in:
https://llvm.org/PR51241
...it could crash without an extra check of the blocks.

There might be a more compact way to write this constraint,
but we can't just count the successors/predecessors without
affecting a test that includes a switch instruction.
2021-07-28 11:09:12 -04:00
Alexey Bataev 3ad6437fcc [SLP]Fix build on MacOS, NFC. 2021-07-28 06:33:13 -07:00
Alexey Bataev e408d1dfab [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-07-28 05:49:06 -07:00
Florian Hahn c07dd2b885
[LV] Move recurrence backedge fixup code to VPlan::execute (NFC).
As suggested in D105008, move the code that fixes up the backedge value
for first order recurrences to VPlan::execute.

Now all that remains in fixFirstOrderRecurrences is the code responsible
for creating the exit values in the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D106244
2021-07-28 13:32:40 +01:00
David Green 41cedb1c9a [LV][ARM] Tighten up MLA reduction costing
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.

 - The Arm implementation of getExtendedAddReductionCost is altered to
   only provide costs for legal or smaller types. Larger than legal types
   need to be split, which currently does not work very well, especially
   for predicated reductions where the predicate may be legal but needs to
   be split. Currently we limit it to legal or smaller input types.
 - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
   is a pattern that can come up, and can be treated the same as
   reduce(mul(ext, ext)) providing the extension types match.
 - And it has been adjusted to not count the ext in reduce(mul(ext, ext))
   as part of a reduce(mul) pattern.

Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.

Differential Revision: https://reviews.llvm.org/D106166
2021-07-28 12:50:58 +01:00
Chris Jackson 04b94c7cae Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Crashes were reported on the upstreamm revision:
https://reviews.llvm.org/D105207

This reverts commit 796b84d26f.
2021-07-28 10:05:54 +01:00
Jose M Monsalve Diaz 5ab6aedda9 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
Johannes Doerfert 3dca83961c Reapply "[Attributor] Disable simplification AAs if a callback is present""
This reapplies commit cbb709e251 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a62.
2021-07-27 19:14:50 -05:00
Johannes Doerfert aa27430a62 Revert "[Attributor] Disable simplification AAs if a callback is present"
This reverts commit cbb709e251 as it
breaks the tests, which was not supposed to happen. Investigating now.
2021-07-27 18:09:42 -05:00