Commit Graph

19764 Commits

Author SHA1 Message Date
Sam Parker c98a8a09b5 [HardwareLoops] Loop guard intrinsic to recognise zext
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.

Patch by: sherwin-dc

Differential Revision: https://reviews.llvm.org/D109631
2021-09-16 08:33:16 +01:00
Owen Anderson 68079ef0eb Teach SimplifyCFG to fold switches into lookup tables in more cases.
In particular, it couldn't handle cases where lookup table constant
expressions involved bitcasts. This does not seem to come up
frequently in C++, but comes up reasonably often in Rust via
`#[derive(Debug)]`.

Originally reported by pcwalton.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109565
2021-09-15 22:07:08 +00:00
Anna Thomas f9e4aebe4a Revert "[InstCombine] Improve TryToSinkInstruction with multiple uses"
This reverts commit 4ac4e52189.
There are couple of test failures, which needs update of the test cases.

Doing a clean revert and will recommit the change along with fixed
testcases.
2021-09-15 18:03:11 -04:00
Anna Thomas 4ac4e52189 [InstCombine] Improve TryToSinkInstruction with multiple uses
This patch allows sinking an instruction which can have multiple uses in a
single user. We were previously over-restrictive by looking for exactly one use,
rather than one user.

Also, the API for retrieving undroppable user has been updated accordingly since
in both usecases (Attributor and InstCombine), we seem to care about the user,
rather than the use.

Reviewed-By: nikic

Differential Revision: https://reviews.llvm.org/D109700
2021-09-15 20:39:38 +00:00
Sanjay Patel e5a32d720e [InstCombine] move extend after insertelement if both operands are extended
I was wondering how instcombine does on the examples in D109236,
and we're missing a basic transform:

inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index)

https://alive2.llvm.org/ce/z/z2aBu9

Note that there are several possible extensions of this fold
(see TODO comments).

Differential Revision: https://reviews.llvm.org/D109537
2021-09-15 14:38:03 -04:00
Anna Thomas 36ef65adc3 [InstCombine] Update test checks through autogeneration, add more tests. NFC
Updated check lines.
Tests precommitted from D109700.
2021-09-15 16:20:30 +00:00
Max Kazantsev c78ed20784 [Test] Add a test showing missing opportunities in branch deletion by indvars 2021-09-15 22:17:10 +07:00
Alexey Bataev 446e11fa29 [SLP][NFC]Add a test for tiny tree with stores and with not
same/alternate instructions.
2021-09-15 08:07:01 -07:00
Filipp Zhinkin f5d8952356 [InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y)
Enabled mul folding optimization that was previously disabled
by being incorrect.
To preserve correctness, mul's operand that is not compared
with zero in select's condition is now frozen.

Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286

Correctness:
https://alive2.llvm.org/ce/z/bHef7J
https://alive2.llvm.org/ce/z/QcR7sf
https://alive2.llvm.org/ce/z/vvBLzt
https://alive2.llvm.org/ce/z/jGDXgq
https://alive2.llvm.org/ce/z/3Pe8Z4
https://alive2.llvm.org/ce/z/LGga8M
https://alive2.llvm.org/ce/z/CTG5fs

Differential Revision: https://reviews.llvm.org/D108408
2021-09-15 09:04:06 -04:00
Sanjay Patel be1028053e [PhaseOrdering] add tests for PR47023; NFC 2021-09-15 08:44:04 -04:00
Simon Pilgrim 0767e43d87 [CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports
Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).
2021-09-15 13:04:40 +01:00
Florian Hahn 05c120823b
[DSE] Add capture-before test cases with loads.
Add a set of test cases where redundant stores may be removable,
depending on whether a local allocation gets captured before performing
a load.
2021-09-15 11:13:35 +01:00
David Green 61cc873a8e [LV] Recognize intrinsic min/max reductions
This extends the reduction logic in the vectorizer to handle intrinsic
versions of min and max, both the floating point variants already
created by instcombine under fastmath and the integer variants from
D98152.

As a bonus this allows us to match a chain of min or max operations into
a single reduction, similar to how add/mul/etc work.

Differential Revision: https://reviews.llvm.org/D109645
2021-09-15 10:45:50 +01:00
David Green bddfbf91ed [LV] Min/max intrinsic reduction test cases. 2021-09-15 09:56:19 +01:00
Florian Hahn e90d55e1c9
[VPlan] Support sinking recipes with uniform users outside sink target.
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.

This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.

Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104254
2021-09-15 09:21:39 +01:00
Philip Reames d4e03bccd4 regen an autogened test which is stale 2021-09-14 18:42:23 -07:00
Matt Arsenault 88146230e1 SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code
ConstantOffsetExtractor::Find was infinitely recursing on the add
referencing itself.
2021-09-14 19:49:38 -04:00
Matt Arsenault fdd9761dd1 Attributor: Fix crash on undef in !callees 2021-09-14 19:49:34 -04:00
Florian Hahn e248d69036
Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer."
SCEV does not look through non-header PHIs inside the loop. Such phis
can be analyzed by adding separate accesses for each incoming pointer
value.

This results in 2 more loops vectorized in SPEC2000/186.crafty and
avoids regressions when sinking instructions before vectorizing.

Fixes PR50296, PR50288.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D102266
2021-09-14 11:19:12 +01:00
David Green 5a6dfbb8cd [ARM] Teach DemandedVectorElts about VMOVN lanes
The class of instructions that write to narrow top/bottom lanes only
demand the even or odd elements of the input lanes. Which means that a
pair of VMOVNT; VMOVNB demands no lanes from the original input. This
teaches that to instcombine from the target hooks available through
ARMTTIImpl.

Differential Revision: https://reviews.llvm.org/D109325
2021-09-14 11:05:31 +01:00
Arthur Eubanks 096d9814aa [opt] Remove some legacy PM flags
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109664
2021-09-13 15:50:03 -07:00
Kuba Mracek e80ee4cbd9 [GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol
This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot.

Differential Revision: https://reviews.llvm.org/D109169
2021-09-13 15:22:11 -07:00
Jon Chesterfield 6775ad2025 [openmp] Apply test change from D109500 2021-09-13 18:36:29 +01:00
Jon Chesterfield bfcf979978 Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu"
This reverts commit d5c049a3f6.
Going to re-commit it in pieces for easier application to 13
2021-09-13 18:25:07 +01:00
Philip Reames 6fec6552f5 Revert "[IndVars] Replace PHIs if loop exits on 1st iteration"
This reverts commit 5a6dfb27ca.  See original review for why.
2021-09-13 10:11:18 -07:00
Philip Reames 5746c76f3f Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration"
This reverts commit d9ca444835.  See review for why.
2021-09-13 10:10:49 -07:00
dpalermo d5c049a3f6 [openmp] Fix 51647, corrupt bitcode on amdgpu
Patch by @dpalermo

The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader.

This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use.

This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space.

Reviewed By: ronlieb, jdoerfert, dpalermo-phab

Differential Revision: https://reviews.llvm.org/D109500
2021-09-13 15:24:48 +01:00
Florian Hahn 4b342268c0
[VPlan] Add test that requires duplicating recipe for sinking. 2021-09-13 14:21:20 +01:00
Florian Hahn c24fc37e47
[VectorCombine] Support AND/UREM indices that require freezing.
38b098be66 limited scalarization to indices that are known non-poison.
For certain patterns that restrict the range of an index, we can insert
a freeze of the original value, to prevent propagation of poison.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107580
2021-09-13 11:21:45 +01:00
Jingu Kang 2a26d47a2d [LoopBoundSplit] Check the start value of split cond AddRec
After transformation, we assume the split condition of the pre-loop is always
true. In order to guarantee it, we need to check the start value of the split
cond AddRec satisfies the split condition.

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 10:32:35 +01:00
Max Kazantsev 7e337d8ba2 [Test] Add more sophisticated tests for switch UB opt
Optimizer is being too smart with existing tests, and the transform
gets concealed by following transforms.
2021-09-13 15:28:23 +07:00
Simon Pilgrim 6d970e83fa [InstCombine] Add PR51784 test cases 2021-09-13 08:36:43 +01:00
Max Kazantsev d9ca444835 [IndVars] Break backedge and replace PHIs if loop exits on 1st iteration
Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit
is taken on 1st iteration, we make all branches in the following exiting blocks
always branch out of the loop and their conditions simplified away.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108910
Reviewed By: lebedev.ri
2021-09-13 11:30:55 +07:00
Max Kazantsev 5a6dfb27ca [IndVars] Replace PHIs if loop exits on 1st iteration
This is a part of D108910.
We replace all loop PHIs with values coming from the loop preheader if
we proved that backedge is never taken.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D109596
Reviewed By: lebedev.ri
2021-09-13 10:50:33 +07:00
Florian Hahn 368af7558e
[VPlan] Fix crash caused by not updating all users properly.
Users of VPValues are managed in a vector, so we need to be more
careful when iterating over users while updating them. For now, just
copy them.

Fixes 51798.
2021-09-12 18:10:53 +01:00
Sanjay Patel 3a126134d3 [InstCombine] remove casts from splat-a-bit pattern
https://alive2.llvm.org/ce/z/_AivbM

This case seems clear since we can reduce instruction count
and avoid an intermediate type change, but we might want to
use mask-and-compare for other sequences.

Currently, we can generate more instructions on some related
patterns by trying to use bit-hacks instead of mask+cmp, so
something is not behaving as expected.
2021-09-12 09:18:14 -04:00
Sanjay Patel 75e8eb2b10 [InstCombine] update code/test comments; NFC
Follow-up for post-commit suggestion on:
28afaed691

The comments were partly copied from the original
code, but not updated to match the new code.
2021-09-11 10:53:53 -04:00
Sanjay Patel 28afaed691 [InstCombine] fold sub of min/max intrinsics with invertible ops
This is a translation of the existing code to handle the intrinsics
and another step towards D98152.

https://alive2.llvm.org/ce/z/jA7eBC

This pattern is already handled by underlying folds if there are
less uses, so the minimal tests in this case have extra uses.

The larger cmyk tests show the motivation - when combined with
other folds, we invert a larger sequence and eliminate 'not' ops.
2021-09-11 09:18:46 -04:00
Usman Nadeem ab111e982f Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation""
This reverts commit eee7d225de.
Effectively relanding 98c37247d8
after fixing the failing tests.

Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5
2021-09-10 18:11:24 -07:00
Johannes Doerfert 99ea8ac9f1 Reapply "[OpenMP] Group side-effects to improve guarding efficiency"
This reapplies ca134c3963, effectively
reverting commit d2f206e0af.

Minor test changes to make the test pass.
2021-09-10 15:22:57 -05:00
Johannes Doerfert c09fbbdcfb Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals""
This reapplies commit 7dbba3376f, or, put
differently, this reverts commit d9a8d20827.

The test now requires the amdgpu and nvptx backend explicitly as it
won't work without properly.
2021-09-10 15:22:56 -05:00
Usman Nadeem eee7d225de Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"
This reverts commit 98c37247d8.
2021-09-10 13:01:48 -07:00
Usman Nadeem 98c37247d8 [AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation
Differential Revision: https://reviews.llvm.org/D109118

Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3
2021-09-10 12:52:14 -07:00
Sanjay Patel 188375f478 [InstCombine] add tests for sub of min/max intrinsics; NFC 2021-09-10 14:53:05 -04:00
Joseph Huber 9e2fc0ba37 [OpenMP] Check OpenMP assumptions on call-sites as well
This patch adds functionality to check assumption attributes on call
sites as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109376
2021-09-10 14:52:47 -04:00
Anton Afanasyev 54d8ebbbfd [AggressiveInstCombine] Add `udiv` and `urem` instrs to TruncInstCombine DAG
Add `udiv` and `urem` instructions to the DAG post-dominated by `trunc`,
allowing TruncInstCombine to reduce bitwidth of expressions containing these
instructions. It is sufficient to require that all truncated bits of both
operands are zeros: https://alive2.llvm.org/ce/z/yiithn
(`urem` case is identical).

Differential Revision: https://reviews.llvm.org/D109515
2021-09-10 20:29:08 +03:00
Anton Afanasyev ea7b2c147f [Test][AggressiveInstCombine] Add test for `udiv` and `urem`
Precommit test for D109515
2021-09-10 20:29:08 +03:00
Johannes Doerfert d2f206e0af Revert "[OpenMP] Group side-effects to improve guarding efficiency"
This reverts commit ca134c3963.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:24:00 -05:00
Johannes Doerfert d9a8d20827 Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"
This reverts commit 7dbba3376f.

There seems to be a problem with the tests, investigating now:
  https://lab.llvm.org/buildbot/#/builders/61/builds/14574
2021-09-10 12:23:08 -05:00
Johannes Doerfert 7dbba3376f [GlobalOpt][FIX] Do not embed initializers into AS!=0 globals
Not all address spaces support initializers for globals and we can
therefore not set them without checking if they are allowed. This
patch adds a hook into TTI to check if an AS allows non-undef
initializers. We disable it for all but address space 0 by default,
NVPTX and AMDGPU targets allow all but address space 3.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D109337
2021-09-10 12:08:50 -05:00
Johannes Doerfert ca134c3963 [OpenMP] Group side-effects to improve guarding efficiency
When we guard side-effects as part of SPMDzation we do it for
consecutive instructions that need guarding. This patch will try to
reorder guarded side-effects in a block to decrease the number of
guarded regions we need. It does not use any smarts, e.g., alias
analysis, to move side-effects over non-interfering reads. Instead,
it only moves side-effects downwards to the next guarded side-effect
if there was nothing in between that could have possibly be affected.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D109070
2021-09-10 12:08:48 -05:00
Nikita Popov 90ec6dff86 [OpaquePtr] Forbid mixing typed and opaque pointers
Currently, opaque pointers are supported in two forms: The
-force-opaque-pointers mode, where all pointers are opaque and
typed pointers do not exist. And as a simple ptr type that can
coexist with typed pointers.

This patch removes support for the mixed mode. You either get
typed pointers, or you get opaque pointers, but not both. In the
(current) default mode, using ptr is forbidden. In -opaque-pointers
mode, all pointers are opaque.

The motivation here is that the mixed mode introduces additional
issues that don't exist in fully opaque mode. D105155 is an example
of a design problem. Looking at D109259, it would probably need
additional work to support mixed mode (e.g. to generate GEPs for
typed base but opaque result). Mixed mode will also end up
inserting many casts between i8* and ptr, which would require
significant additional work to consistently avoid.

I don't think the mixed mode is particularly valuable, as it
doesn't align with our end goal. The only thing I've found it to
be moderately useful for is adding some opaque pointer tests in
between typed pointer tests, but I think we can live without that.

Differential Revision: https://reviews.llvm.org/D109290
2021-09-10 15:18:23 +02:00
Filipp Zhinkin 745f82b8d9 [InstCombine] add tests for X == 0 ? 0 : X * Y ; NFC
These are the tests for D108408 with current baseline results.
2021-09-10 09:06:48 -04:00
Max Kazantsev 6b69cc09b7 [Test][NFC] Regenerate checks in test 2021-09-10 18:46:10 +07:00
Sjoerd Meijer 6a076fa953 [LoopFlatten] Make the analysis more robust after IV widening
LoopFlatten wasn't triggering on this motivating case after IV widening:

  void foo(int *A, int N, int M) {
    for (int i = 0; i < N; ++i)
      for (int j = 0; j < M; ++j)
        f(A[i*M+j]);
  }

The reason was that the old induction phi nodes were getting in the way. These
narrow and dead induction phis are not always trivially dead, and having both
the narrow and wide IVs confused the analysis and caused it to bail. This adds
some extra bookkeeping for these old phis, so we can filter them out when
checks on phi nodes are performed. Other clean up passes will get rid of these
old phis and increment instructions.

As this was one of the motivating examples from the beginning, it was
surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG
is just slightly different.

Differential Revision: https://reviews.llvm.org/D109309
2021-09-10 12:34:04 +01:00
Rosie Sumpter 9d1bea9c88 [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction
For SVE, when scalarising the PHI instruction the whole vector part is
generated as opposed to creating instructions for each lane for fixed-
width vectors. However, in some cases the lane values may be needed
later (e.g for a load instruction) so we still need to calculate
these values to avoid extractelement being called on the vector part.

Differential Revision: https://reviews.llvm.org/D109445
2021-09-10 11:58:04 +01:00
Sjoerd Meijer 4f9217c519 [FuncSpec] Don't specialise call sites that have the MinSize attribute set
The MinSize attribute can be attached to both the callee and the caller
in the callsite. Function specialisation was already skipped for function
declarations (callees) with MinSize. This also skips specialisations for
the callsite when it has MinSize set.

Differential Revision: https://reviews.llvm.org/D109441
2021-09-10 09:01:45 +01:00
Max Kazantsev 09d0fa3bbe [Test] Add tests showing missed opportunity for SimplifyCFG for switches
Patch by Dmitry Bakunevich!
2021-09-10 11:16:13 +07:00
Nikita Popov af382b9383 [IR] Handle constant expressions in containsUndefinedElement()
If the constant is a constant expression, then getAggregateElement()
will return null. Guard against this before calling HasFn().
2021-09-09 22:04:12 +02:00
Sanjay Patel 3cb5aa8622 [InstCombine] add tests for insertelement with cast ops; NFC 2021-09-09 14:58:39 -04:00
Sanjay Patel 97a4e7b7ff [InstCombine] remove a buggy set of zext-icmp transforms
The motivating case is an infinite loop shown with a reduced test from:
https://llvm.org/PR51762

To solve this, I'm proposing we delete the most obviously broken part of this code.

The bug example shows a fundamental problem: we ask computeKnownBits if a transform
will be profitable, alter the code by creating new instructions, then rely on
computeKnownBits to return the same answer to actually eliminate instructions.

But there's no guarantee that the results will be the same between the 1st and 2nd
calls. In the infinite loop example, we get different answers, so we add
instructions that conflict with some other transform, and we're stuck.

There's at least one other problem visible in the test diff for
`@zext_or_masked_bit_test_uses`: the code doesn't check uses properly, so we can
end up with extra instructions created.

Last, it's not clear if this set of transforms actually improves analysis or
codegen. I spot-checked a few targets and don't see a clear win:
https://godbolt.org/z/x87EWovso

If we do see a regression from this change, codegen seems like the right place to
add a cmp -> bit-hack fold.

If this is too big of a step, we could limit the computeKnownBits calls by not
passing a context instruction and/or limiting the recursion. I checked that those
would stop the infinite loop for PR51762, but that won't guarantee that some other
example does not fall into the same loop.

Differential Revision: https://reviews.llvm.org/D109440
2021-09-09 08:49:39 -04:00
Roman Lebedev 909cba9699
[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125)
I can't seem to wrap my head around the proper fix here,
we should be fine without this requirement, iff we can form this form,
but the naive attempt (https://reviews.llvm.org/D106317) has failed.
So just to unblock the release, put up a restriction.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51125
2021-09-09 12:28:09 +03:00
Jun Ma 8ba2adcf9e Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""
Differential Revision: https://reviews.llvm.org/D106056
2021-09-09 16:53:33 +08:00
Nikita Popov 6dfdc6bfd2 [SROA] Support opaque pointers
Make the following changes in order to support opaque pointers in SROA:

 * Generate i8 GEPs for opaque pointers.
 * Explicitly enforce that promotable allocas only have stores of
   the alloca type -- previously this was implicitly enforced.
 * Replace a check for pointer element type with load/store type.

Differential Revision: https://reviews.llvm.org/D109259
2021-09-08 22:25:44 +02:00
Arthur Eubanks b493124ae2 [MemorySSA] Support invariant.group metadata
The implementation is mostly copied from MemDepAnalysis. We want to look
at all loads and stores to the same pointer operand. Bitcasts and zero
GEPs of a pointer are considered the same pointer value. We choose the
most dominating instruction.

Since updating MemorySSA with invariant.group is non-trivial, for now
handling of invariant.group is not cached in any way, so it's part of
the walker. The number of loads/stores with invariant.group is small for
now anyway. We can revisit if this actually noticeably affects compile
times.

To avoid invariant.group affecting optimized uses, we need to have
optimizeUsesInBlock() not use invariant.group in any way.

Co-authored-by: Piotr Padlewski <prazek@google.com>

Reviewed By: asbirlea, nikic, Prazek

Differential Revision: https://reviews.llvm.org/D109134
2021-09-08 13:06:12 -07:00
Akira Hatanaka dea6f71af0 [ObjC][ARC] Use the addresses of the ARC runtime functions instead of
integer 0/1 for the operand of bundle "clang.arc.attachedcall"

https://reviews.llvm.org/D102996 changes the operand of bundle
"clang.arc.attachedcall". This patch makes changes to llvm that are
needed to handle the new IR.

This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.

Differential Revision: https://reviews.llvm.org/D103000
2021-09-08 11:58:03 -07:00
Joseph Huber 6b9a3ec3a2 [OpenMP] Do not SPMDize generic regions with no parallel
This patch changes SPMDization to not trigger for regions with no
parallelism. Otherwise, this will introduce unnecessary barriers that
will slow the single-threaded region down.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109438
2021-09-08 14:33:15 -04:00
Andrew Litteken c172f1ad39 [IROutliner] Adding supports for multiple exits
When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location.

This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106993
2021-09-08 08:58:07 -07:00
Sanjay Patel b041b613e6 [InstCombine] add test for zext with 'or' op; NFC 2021-09-08 09:58:06 -04:00
Sanjay Patel 5639946d89 [InstCombine] remove unnecessary instructions from test; NFC 2021-09-08 09:58:06 -04:00
Sjoerd Meijer a1e8b754eb [FuncSpec] Fix test case: only run funcspec and not any other passes. NFC. 2021-09-08 12:40:58 +01:00
Fraser Cormack 7fb66d4035 [MemCpyOpt] Fix a variety of scalable-type crashes
This patch fixes a variety of crashes resulting from the `MemCpyOptPass`
casting `TypeSize` to a constant integer, whether implicitly or
explicitly.

Since the `MemsetRanges` requires a constant size to work, all but one
of the fixes in this patch simply involve skipping the various
optimizations for scalable types as cleanly as possible.

The optimization of `byval` parameters, however, has been updated to
work on scalable types in theory. In practice, this optimization is only
valid when the length of the `memcpy` is known to be larger than the
scalable type size, which is currently never the case. This could
perhaps be done in the future using the `vscale_range` attribute.

Some implicit casts have been left as they were, under the knowledge
they are only called on aggregate types. These should never be
scalably-sized.

Reviewed By: nikic, tra

Differential Revision: https://reviews.llvm.org/D109329
2021-09-08 11:21:36 +01:00
Max Kazantsev 29d054bf12 [SimplifyCFG] Preserve knowledge about guarding condition by adding assume
This improvement adds "assume" after removal of branch basing on UB in successor block.

Consider the following example:

```
pred:
  x = ...
  cond = x > 10
  br cond, bb, other.succ

bb:
  phi [nullptr, pred], ... // other possible preds
  load(phi) // UB if we came from pred

other.succ:
  // here we know that x <= 10, but this knowledge is lost
  // after the branch is turned to unconditional unless we
  // preserve it with assume.
```

If we remove the branch basing on knowledge about UB in a successor block,
then the fact that x <= 10 is other.succ might be lost if this condition is
not inferrable from any dominating condition. To preserve this knowledge, we
can add assume intrinsic with (possibly inverted) branch condition.

Patch by Dmitry Bakunevich!

Differential Revision: https://reviews.llvm.org/D109054
Reviewed By: lebedev.ri
2021-09-08 14:05:17 +07:00
Yuanfang Chen 79c00d3f54 [NPM] Make AddDiscriminators pass required
This is to make sure the pass is not skipped at O0 where optnone is
applied to functions by default.
2021-09-07 17:02:24 -07:00
Sanjay Patel a3c1669b17 [InstCombine] fold icmp equality with 'or' mask ops
This could go either direction since the instruction
count is the same either way, but there are a few
reasons to prefer this:
1. We already do the related transform with 'and'
   (see just above the new code).
2. We try (too hard) to compensate for not having this
   and possibly other folds in transformZExtICmp(),
   and that leads to bugs like https://llvm.org/PR51762 .
3. Codegen looks better across a variety of targets.

https://alive2.llvm.org/ce/z/uEgn4P
2021-09-07 16:34:00 -04:00
Sanjay Patel 9565457aad [InstCombine] add tests for icmp with 'or' ops; NFC 2021-09-07 16:34:00 -04:00
Nikita Popov 58db5f6e95 [ConstFold] Support opaque pointers in constexpr GEPs
Support opaque pointers in SymbolicallyEvaluateGEP() by using the
value type of a GlobalValue base or falling back to i8 if there
isn't one. We don't unconditionally generate i8 GEPs here because
that would lose inrange attribues, and because some optimizations
on globals currently rely on GEP types (e.g. the globals SROA
mentioned in the comment).

Differential Revision: https://reviews.llvm.org/D109297
2021-09-07 20:50:29 +02:00
Roman Lebedev 35fa7b8ad8
Reland "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)"
This reverts commit 91f7a4fff7,
relanding commit 13ec913bdf.

The original commit was reverted because of (essentially)
https://bugs.llvm.org/show_bug.cgi?id=35922
which has now been addressed by d0eeb64be5.
2021-09-07 21:03:52 +03:00
Dávid Bolvanský 3b5f318f5d [InstCombine] ror/rol(X, RotAmt) == C --> X == rol/ror(C, RotAmt) (PR51567)
```
----------------------------------------
define i1 @src(i32 %0) {
%1:
  %2 = fshl i32 %0, i32 %0, i32 25
  %3 = icmp eq i32 %2, 5
  ret i1 %3
}
=>
define i1 @tgt(i32 %0) {
%1:
  %2 = icmp eq i32 %0, 640
  ret i1 %2
}
Transformation seems to be correct!
```

https://alive2.llvm.org/ce/z/GdY8Jm

Solves PR51567

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109283
2021-09-07 18:04:58 +02:00
Andrew Litteken 81d3ac0cf2 [IROutliner] Adding outlining for single entry/single exit multiblock regions
Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality.

For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks.

Reviewer: paquette

Differential Revision: https://reviews.llvm.org/D106990
2021-09-07 08:51:54 -07:00
Sanjay Patel 761835521c [InstCombine] add tests for smear-a-set-bit; NFC
Possible follow-ups from patterns discussed in D109155.
2021-09-07 11:42:33 -04:00
Jingu Kang 61d8e27193 [test] precommit a test for D109354 2021-09-07 16:02:53 +01:00
Anton Afanasyev d1f9b21677 [AggressiveInstCombine] Add `AssumptionCache` to aggressive instcombine
Add support for @llvm.assume() to TruncInstCombine allowing
optimizations based on these intrinsics while computing known bits.
2021-09-07 16:45:00 +03:00
Anton Afanasyev 388b7a1502 [AggressiveInstCombine][Test] Add test for assumptions 2021-09-07 16:45:00 +03:00
Dávid Bolvanský 08144b8318 [NFC] Added test for stpcpy -> strcpy transformation with AS != 0 2021-09-07 14:30:14 +02:00
David Green ac5a5af19d [ARM] Add tests for MVE narrowing intrinsic demand bits. 2021-09-06 22:03:32 +01:00
Andrew Litteken bd4b1b5f6d [IRSim] Adding support for recognizing branch similarity
The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them.

This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar.

We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter.

Reviewers: paquette, yroux

Differential Revision: https://reviews.llvm.org/D106989
2021-09-06 11:55:38 -07:00
Dávid Bolvanský d4da4b8025 [NFC] Added tests for D109283 2021-09-06 19:40:52 +02:00
Sanjay Patel 0d83e72034 [InstCombine] fix infinite loop from shift transform
I'm not sure if there is a better way or another bug
still here, but this is enough to avoid the loop from:
https://llvm.org/PR51657

The test requires multiple blocks and datalayout to
trigger the problem path.
2021-09-06 11:13:39 -04:00
Sanjay Patel fbb78668f2 [InstCombine] fix one-use condition for shift transform
This transform is written in a confusing style,
and I suspect it is at fault for a more serious
bug noted in PR51567.

But it's been around forever, so I'm making the
minimal change to fix another bug - it could
increase instructions because it was not checking
uses.
2021-09-06 11:13:39 -04:00
Sanjay Patel a73973c9d4 [InstCombine] add test for shift-trunc-shift with extra uses; NFC
The transform doesn't check for extra uses, so we
have more instructions than we started with.
2021-09-06 11:13:38 -04:00
Sander de Smalen 96f6785bc9 [VectorUtils] Teach findScalarElement to return splat value.
If the vector is a splat of some scalar value, findScalarElement()
can simply return the scalar value if it knows the requested lane
is in the vector.

This is only needed for scalable vectors, because the InsertElement/ShuffleVector
case is already handled explicitly for the fixed-width case.

This helps to recognize an InstCombine fold like:
  extractelt(bitcast(splat(%v))) -> bitcast(%v)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D107254
2021-09-06 10:56:06 +01:00
Arthur Eubanks a43853aecd [test] Remove -loop-guard-widening legacy PM tests 2021-09-05 11:36:21 -07:00
Simon Pilgrim f114ef3731 [CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds
Based off the improved fold in D108522

This should eventually allow us to replace the SLM only cost patterns with generic versions.
2021-09-05 16:08:11 +01:00
Dávid Bolvanský 9c476172b9 [InstCombine] stpcpy(d,s) -> strcpy(d,s) if the result is not used 2021-09-05 12:12:07 +02:00
Arthur Eubanks 37e6a27da7 [test] Fixup tests with -analyze in llvm/test/Transforms 2021-09-04 16:45:51 -07:00
Anton Afanasyev dd028c359e [SLP][Test] Add tests for PR47624 and PR49933
Add tests monitoring issues fix. They should be fixed when
https://reviews.llvm.org/D57059 ("Initial support for the vectorization
of the non-power-of-2 vectors") is landed.
2021-09-05 01:16:59 +03:00
Dávid Bolvanský 2572c76ec9 [NFC] Added testcases for new binop with select transformation 2021-09-04 20:15:50 +02:00
Dávid Bolvanský 3a696f6092 [InstCombine] rotate(X,Z) eq/ne rotate(Y,Z) ---> X eq/ne Y (PR51565)
```

----------------------------------------
define i1 @src(i8 %x, i8 %y, i8 %z) {
%0:
  %f = fshl i8 %x, i8 %x, i8 %z
  %f2 = fshl i8 %y, i8 %y, i8 %z
  %r = icmp eq i8 %f, %f2
  ret i1 %r
}
=>
define i1 @tgt(i8 %x, i8 %y, i8 %z) {
%0:
  %r = icmp eq i8 %x, %y
  ret i1 %r
}
Transformation seems to be correct!

```

https://alive2.llvm.org/ce/z/qAZp8f

Solves PR51565

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D109271
2021-09-04 18:58:44 +02:00
Bjorn Pettersson 0f0344dd1e [SimpleLoopUnswitch] Inform pass manager when child loops are deleted
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).

Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D109257
2021-09-04 17:54:39 +02:00
Dávid Bolvanský 73e1ba6215 [NFC] Added tests for PR51565 2021-09-04 14:38:43 +02:00
Dávid Bolvanský 9e06c767a4 [NFC] Added testcase for PR39116 2021-09-04 10:52:46 +02:00
Dávid Bolvanský 2aea581004 [NFC] Added testcase for PR48641 2021-09-04 10:44:39 +02:00
Sanjay Patel fd807601a7 [InstCombine] fold (rotate X) eq/ne (0/-1)
This generalizes the examples shown in:
https://llvm.org/PR51566

https://alive2.llvm.org/ce/z/V-sEy9
2021-09-03 14:51:35 -04:00
Sanjay Patel 8f4042ee40 [InstCombine] add tests for icmp of rotate (PR51566); NFC 2021-09-03 14:51:35 -04:00
Jingu Kang 562521e2d1 [LoopBoundSplit] Update phi node in exit block
It fixes https://bugs.llvm.org/show_bug.cgi?id=51700

Differential Revision:
2021-09-03 09:10:50 +01:00
Max Kazantsev 0f80961e8c [Test] Missed opt test for D108910
We can fold loop phis after we've proved that some exit has EC=0
in IndVars.

Patch by Dmitry Makogon!
2021-09-03 12:44:52 +07:00
Anna Thomas f661ce209f [LoopPredication] Fix MemorySSA crash in predicateLoopExits
The attached testcase crashes without the patch (Not the same accesses
in the same order).

When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.

Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197
2021-09-02 21:26:07 -04:00
Wenlei He 054487c5b2 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 17:29:26 -07:00
Philip Reames fa82a3d016 [runtimeunroll] Support epilogue unrolling with a parent loop
This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476
2021-09-02 16:29:20 -07:00
Kevin Athey 04ed6e7afc Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing"
This reverts commit a2768b4732.

Breaks sanitizer-x86_64-linux-fast buildbot:
https://lab.llvm.org/buildbot/#/builders/5/builds/11334

Log snippet:
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 11';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 14';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: Transforms/SampleProfile/early-inline.ll
  LLVM :: Transforms/SampleProfile/inline-cold.ll
2021-09-02 14:48:31 -07:00
Dávid Bolvanský 00f8aecf6e [NFC] Added testcase for PR40750 2021-09-02 22:44:03 +02:00
Nikita Popov c86e1ce73b [SCEVExpander] Simplify pointer overflow check
This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.

The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.

Differential Revision: https://reviews.llvm.org/D109093
2021-09-02 20:15:59 +02:00
Daniil Suchkov 5c97507e2b [InlineCost] Introduce attributes to override InlineCost for inliner testing
This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033
2021-09-02 17:35:06 +00:00
Wenlei He a2768b4732 [CSSPGO] Honor preinliner decision for ThinLTO importing
When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088
2021-09-02 08:24:06 -07:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Florian Hahn 3e60d216a4
[LoopDistribute] Add tests inspired by PR50296, PR50288. 2021-09-02 09:17:32 +02:00
Wenlei He c000b8bd5c [CSSPGO] Use preinliner decision by default when available
For CSSPGO, turn on `sample-profile-use-preinliner` by default. This simplifies the use of llvm-profgen preinliner as it's now simply driven by ContextShouldBeInlined flag for each context profile without needing extra compiler switch.

Note that llvm-profgen's preinliner is still off by default, under switch `csspgo-preinliner`.

Differential Revision: https://reviews.llvm.org/D109111
2021-09-01 23:45:38 -07:00
Philip Reames c3b3aa277a Fix a missing MemorySSA update in breakLoopBackedge
This is a case I'd missed in 6a8237. The odd bit here is that missing the edge removal update seems to produce MemorySSA which verifies, but is still corrupt in a way which bothers following passes. I wasn't able to reduce a single pass test case, which is why the reported test case is taken as is.

Differential Revision: https://reviews.llvm.org/D109068
2021-09-01 16:59:01 -07:00
Philip Reames e735f2bf37 [SCEVExpander] Prefer pointer expansion for overflow checks
We'd special cased this logic to use pointer types for non-integral pointers, but there's no reason we can't do that for all pointer types.   Doing it this was has a few advantages:
a) The code itself becomes more straight forward, and easier to test.
b) We avoid introducing ptrtoint into programs which didn't have them in the source.
c) The resulting codegen is easier to analyze and simplify (mostly due to lack of ptrtoint).

Note that there are some test diffs, but a) running them through instcombine helps a ton, and b) there's enough missing obvious transforms on both before and after IR that it's clear this isn't performance sensitive.

This is mostly motivated by cleaning up mentions of non-integrals to have a clearer idea of what we actually need to support.

Differential Revision: https://reviews.llvm.org/D104662
2021-09-01 13:11:25 -07:00
Hongtao Yu f4711e0d00 [CSSPGO] Sort function offset table to speed up profile loading.
With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time.

Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D109036
2021-09-01 12:17:48 -07:00
Nikita Popov 02f74eadbe [IVDescriptors] Make pointer inductions compatible with opaque pointers
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.

A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).

Differential Revision: https://reviews.llvm.org/D104795
2021-09-01 21:02:05 +02:00
Sanjay Patel 8c7a7e1f67 [InstCombine] allow more min/max with 'not' folds for intrinsics
isFreeToInvert allows min/max with 'not' on both operands,
so easing the argument restriction catches the case where
that operand has one use.

We already handle the sub-patterns when there are less uses:
https://alive2.llvm.org/ce/z/8Jatm_

...but this is another step towards parity with the
equivalent icmp+select idioms ( D98152 ).

Differential Revision: https://reviews.llvm.org/D109059
2021-09-01 14:40:00 -04:00
Sanjay Patel 8a10f4a0f6 [InstCombine] use isFreeToInvert to generalize min/max with 'not'
This mimics the code for the corresponding cmp-select idiom.

This also prevents an infinite loop because isFreeToInvert
does not match constant expressions.

So this patch solves the same problem as D108814 and obsoletes
it, but my main motivation is to enhance the pattern matching
to allow more invertible ops. That change will be a follow-up
patch on top of this one.

Differential Revision: https://reviews.llvm.org/D109058
2021-09-01 14:34:22 -04:00
Adrian Prantl 12de296d84 Tighten heuristic for coroutine debug info workaround.
The OutermostLoad condition is supposed to strip the outermost
DW_OP_deref operation because dbg.declares are implicitly
indirect. This patch makes sure the heuristic is only applied to
dbg.declare intrinsics and only if the outermost instruction is a
load.

This was found while qualifying the latest Swift compiler rebranch.

rdar://82037764
2021-09-01 11:15:36 -07:00
Hongtao Yu 7ca8030030 [CSSPGO] Enable loading MD5 CS profile.
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.

There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D108342
2021-09-01 09:19:47 -07:00
Nikita Popov 9d720dcb89 [LoadStoreVectorizer] Make aliasing check more precise
The load store vectorizer currently uses isNoAlias() to determine
whether memory-accessing instructions should prevent vectorization.
However, this only works for loads and stores. Additionally, a
couple of intrinsics like assume are special-cased to be ignored.

Instead use getModRefInfo() to generically determine whether the
instruction accesses/modifies the relevant location. This will
automatically handle all inaccessiblememonly intrinsics correctly
(as well as other calls that don't modref for other reasons).
This requires generalizing the code a bit, as it was previously
only considering loads and stored in particular.

Differential Revision: https://reviews.llvm.org/D109020
2021-09-01 18:10:09 +02:00
Sanjay Patel c2162e4d89 [InstCombine] add tests for min/max intrinsics with not ops; NFC 2021-08-31 20:08:38 -04:00
Joseph Huber 29a74a3915 [OpenMP] Add an option to always inline OpenMP device functions.
Performance on GPU targets can be highly variable, sometimes inlining
everything hurts performance and sometimes it greatly improves it. Add
an option to toggle this behaviour to better investigate it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109014
2021-08-31 18:48:30 -04:00
Nikita Popov 48ebe427c9 [SLPVectorizer] Make aliasing check more precise
SLPVectorizer currently uses AA::isNoAlias() to determine whether
two locations alias. This does not work if one of the instructions
is a call. Instead, we should check getModRefInfo(), which
determines whether an arbitrary instruction modifies or references
a given location.

Among other things, this prevents @llvm.experimental.noalias.scope.decl()
and other inaccessiblmemonly intrinsics from interfering with SLP
vectorization.

Differential Revision: https://reviews.llvm.org/D109012
2021-08-31 22:35:30 +02:00
Nikita Popov dc37f5374c [LoadStoreVectorizer] Add test for inaccessiblememonly call (NFC) 2021-08-31 22:12:45 +02:00
Nikita Popov bf8b69bb3a [SLPVectorizer] Add test for inaccessiblememonly call (NFC) 2021-08-31 20:23:26 +02:00
Philip Reames b604fcb7bc [runtime] Move prolog/epilog block to a post-simplify strategy
The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs.

One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect.

Differential Revision: https://reviews.llvm.org/D108521
2021-08-31 09:29:36 -07:00
Philip Reames 9b45fd909f [AlignFromAssume] Bailout w/non-constant alignments (pr51680)
This is a bailout for pr51680.  This pass appears to assume that the alignment operand to an align tag on an assume bundle is constant.  This doesn't appear to be required anywhere, and clang happily generates non-constant alignments for cases such as this case taken from the bug report:

// clang -cc1 -triple powerpc64-- -S -O1 opal_pci-min.c
extern int a[];
long *b;
long c;
void *d(long, int *, int, long, long, long) __attribute__((__alloc_align__(6)));
void e() {
  b = d(c, a, 0, 0, 5, c);
  b[0] = 0;
}

This was exposed by a SCEV change which allowed a non-constant alignment to reach further into the pass' code.  We could generalize the pass, but for now, let's fix the crash.
2021-08-31 09:20:52 -07:00
Kuba Mracek 4c066bd08b [GlobalDCE] Handle relative pointers in VFE (for Swift vtables)
To support Virtual Function Elimination to Swift, this PR adds support for Swift
vtables which contain "relative pointers" instead of direct pointer references.
These are in the form of:

@symbol = ... {
  i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32)
}

The PR extends GlobalDCE's way of looking up a vtable offset into a dependency
to be able to see through this expression and find the target symbol.

Differential Revision: https://reviews.llvm.org/D107645
2021-08-31 07:07:22 -07:00
Sanjay Patel 5d7d689edf [InstCombine] fix propagation of FMF through select-of-fnegs
The existing code was unquestionably wrong - it looked at one
fneg and ignored the other 2 instructions.

It was also untested, so it didn't make the list of bugs
flagged by Alive2.

This is an unusual propagation, but Alive2 agress that we
can intersect the fnegs and union that with the select,
then apply the results to both new instructions:
https://alive2.llvm.org/ce/z/SF8_dt
2021-08-31 09:52:17 -04:00
Anton Afanasyev aaae726afb [SLPVectorizer][Test] Add test for extractelements with (non)const indices (NFC)
Add test for an issue discussed here: https://reviews.llvm.org/D108703#2974289
2021-08-31 16:14:26 +03:00
Sanjay Patel 027de5c7d4 [InstCombine] add tests for FMF propagation for select-of-fneg; NFC 2021-08-31 09:12:03 -04:00
Anton Afanasyev 077d4cb3ab Revert "[SLP]No need to schedule/check parent for extract{element/value} instruction."
Revert since introduced issure reported here:
https://lists.llvm.org/pipermail/llvm-dev/2021-August/152411.html
Discussed starting from here: https://reviews.llvm.org/D108703#2974289

This reverts commit a36bc873a2.
2021-08-31 15:29:06 +03:00
Hongtao Yu b9db70369b [CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.

A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is  time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.

When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.

Testing
This is no-diff change in terms of code quality and profile content (for text profile).

For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.

The compile time of ads is dropped by 25%.

Differential Revision: https://reviews.llvm.org/D107299
2021-08-30 20:09:29 -07:00
Andrew Litteken c58d4c4bd3 [IROutliner] Changing outliner to prioritize reductions on assembly rather than IR instruction
Currently, the IROutliner uses a simple metric to outline the largest amount
of IR possible to outline first if it fits the cost model. This is model
loses out on smaller blocks of code that have higher reductions in cost that
are contained within larger blocks of IR.

This reverses the order, where we calculate all of the costs first, and then
reorder and extract items based on the calculated results.

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106440
2021-08-30 13:43:08 -07:00
David Green efa340fbd2 [ARM] Workaround tailpredication min/max costmodel
The min/max intrinsics are not yet canonical, but when they are the tail
predications analysis will change from treating them like icmp to
treating them like intrinsics. Unfortunately, they can currently produce
better code by not being tail predicated thanks to the vectorizer picking
higher VF's and the backend folding to better instructions (especially
for saturate patterns). In the long run we will need to improve the
vectorizers cost modelling, recognizing the instruction directly, but in
the meantime this treats min/max as before to prevent performance
regressions.
2021-08-30 19:19:51 +01:00
Mikhail Goncharov 5097b6e352 Revert "[SLP]Improve graph reordering."
This reverts commit 84cbd71c95.

This commit breaks one of the internal tests. As agreed with Alexey I
will provide the reproducer later.
2021-08-30 19:16:44 +02:00
Andrew Litteken f564299fe9 [IROutliner] Ensure instructions at end of candidate are excluded
Occasionally instructions are between the last instruction in a region,
and the following instruction as identified by the Candidate.  This
adds an extra check right before splitting a candidate that excludes the region from being split/checked for outlining to remove errors.

Tests Added:
Tranforms/IROuutliner/outlining-extra-bitcasts.ll

Reviewer: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D104142
2021-08-30 09:30:26 -07:00
Danila Malyutin 668b045b8d [LSR][NFC] Add test case for pr42770 2021-08-30 18:46:22 +03:00
Roman Lebedev 7b0d59da9a
[IndVars] Drop check for the validity of rewrite
`isValidRewrite()` checks that the both the original SCEV,
and the rewrite SCEV have the same base pointer.
I //believe//, after all the recent SCEV improvements,
this invariant is already enforced by SCEV itself.

I originally tried changing it into an assert in D108043,
but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621,
where SCEV manages to forward the store to load,
test added.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D108655
2021-08-30 12:06:58 +03:00
Roman Lebedev ada219b13a
[NFC][IndVars] Add test that caused D108043 to be reverted
We currently don't simplify anything here, but we can.
2021-08-30 12:06:58 +03:00
Arthur Eubanks 099e4bcd5d [InstCombine] Remove invariant group intrinsincs when comparing against null
We cannot leak any equivalency information by comparing against null
since null never has virtual metadata associated with it (when null is
not a valid dereferenceable pointer).

Instcombine seems to make sure that a null will be on the RHS, so we
don't have to check both operands.

This fixes a missed optimization in llvm-test-suite's MultiSource lambda
benchmark under -fstrict-vtable-pointers.

Reviewed By: Prazek

Differential Revision: https://reviews.llvm.org/D108734
2021-08-29 15:45:25 -07:00
Andrew Litteken 063af63b96 [IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections.
When the initial relationship between two pairs of values between
similar sections is ambiguous to commutativity, arguments to the
outlined functions can be passed in such that the order is incorrect,
causing miscompilations.  This adds a canonical mapping to each
similarity section, so that we can maintain the relationship of global
value numbering from one section to another.

Added Tests:
Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering

Reviewers: jroelofs, jpaquette, yroux

Differential Revision: https://reviews.llvm.org/D104143
2021-08-27 15:02:56 -07:00
Nikita Popov 757409da7a [MergeICmps] Ignore clobbering instructions before the loads
This is another followup to D106591. Even if there is an
instruction that clobbers one of the loads, this doesn't matter if
it happens before the loads. Those instructions aren't affected by
the transform at all.

The gep-references-bb.ll is modified to preserve the spirit of the
test, as the store to @g no longer impacts the transform.

Differential Revision: https://reviews.llvm.org/D108782
2021-08-27 23:31:35 +02:00