Commit Graph

15680 Commits

Author SHA1 Message Date
Johannes Doerfert fa5d22a045 [OpenMP][NFC] Reuse OMPIRBuilder `struct ident_t` handling in Clang
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D80735
2020-08-10 17:13:26 -05:00
Arthur Eubanks aae349e276 [InstSimplify][test] Remove unused parameter in vscale.ll
Reviewed By: huihuiz

Differential Revision: https://reviews.llvm.org/D85688
2020-08-10 14:48:32 -07:00
Nikita Popov 566a66703f [InstSimplify] Add test for expand binop undef issue (NFC)
Add test case from https://reviews.llvm.org/D83360#2146539.
2020-08-10 22:39:59 +02:00
Wei Mi 4cd8e9b169 [SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles
for invoke instructions.

We see a warning of "No debug information found in function foo: Function
profile not used" in a case. The function foo is called by an invoke
instruction. It has no debug information because it has attribute((nodebug))
in the definition. It shouldn't have profile instance in the sample profile
but compiler thinks it does, that turns out to be a compiler bug in
findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee
is enabled recently.

Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for
invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat
the call as an indirect call and will return any inline instance profile at
the same location as the instruction. That leads to a wrong profile being
returned to function foo.

The patch set CalleeName when the instruction is an invoke.

Differential Revision: https://reviews.llvm.org/D85664
2020-08-10 12:41:09 -07:00
Fangrui Song 3b21a07fd7 [PGO] Delete dead comdat renaming code related to GlobalAlias. NFC
A GlobalAlias is an address-taken user of its aliased function.
canRenameComdatFunc has excluded such cases.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D85597
2020-08-10 09:02:04 -07:00
Simon Pilgrim 90f721404f [SLP] Regenerate load-merge.ll tests
Noticed this NFC change in D57779
2020-08-10 16:09:26 +01:00
Sanjay Patel 3d5118b75c [InstCombine] auto-generate test checks; NFC 2020-08-10 08:27:38 -04:00
Florian Hahn 8393b9fd1f [LoopInterchange] Move instructions from preheader to outer loop header.
Instructions defined in the original inner loop preheader may depend on
values defined in the outer loop header, but the inner loop header will
become the entry block in the loop nest. Move the instructions from the
preheader to the outer loop header, so we do not break dominance. We
also have to check for unsafe instructions in the preheader. If there
are no unsafe instructions, all instructions should be movable.

Currently we move all instructions except the terminator and rely on
LICM to hoist out invariant instructions later.

Fixes PR45743
2020-08-10 12:41:33 +01:00
Florian Hahn 54cb552b96 [LoopInterchange] Form LCSSA phis for values in orig outer loop header.
Values defined in the outer loop header could be used in the inner loop
latch. In that case, we need to create LCSSA phis for them, because after
interchanging they will be defined in the new inner loop and used in the
new outer loop.
2020-08-10 11:33:19 +01:00
Simon Pilgrim 0b26c9eddc [ScalarizeMaskedMemIntrin][X86] Refresh missed transform test cases from rGc0c3b9a25fee
Differential Revision: https://reviews.llvm.org/D85416
2020-08-10 11:14:01 +01:00
Juneyoung Lee ef018cb65c [BuildLibCalls] Add noundef to standard I/O functions
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.

With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).

— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).

In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85345
2020-08-10 10:58:25 +09:00
Florian Hahn d236e1c7b6 [InstSimplify/NewGVN] Add option to control the use of undef.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84792
2020-08-09 19:16:56 +01:00
Florian Hahn 23817cbd0b [SCEVExpander] Make sure cast properly dominates Builder's IP.
The selected cast must properly dominate the Builder's IP, so we cannot
re-use the cast, if it matches the builder's IP.
2020-08-09 16:51:19 +01:00
Aditya Kumar 53ac144848 [HotColdSplit] Add options for splitting cold functions in separate section
Add support for (if enabled) splitting cold functions into a separate section
in order to further boost locality of hot code.

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,rcorcs,vsk

Differential Revision: https://reviews.llvm.org/D85331
2020-08-09 08:48:12 -07:00
Dávid Bolvanský dee938e5cc [Tests] Precommit tests for D85593 2020-08-09 16:07:36 +02:00
Sanjay Patel 43bdac2906 [VectorCombine] try to create vector loads from scalar loads
This patch was adjusted to match the most basic pattern that starts with an insertelement
(so there's no extract created here). Hopefully, that removes any concern about
interfering with other passes. Ie, the transform should almost always be profitable.

We could make an argument that this could be part of canonicalization, but we
conservatively try not to create vector ops from scalar ops in passes like instcombine.

If the transform is not profitable, the backend should be able to re-scalarize the load.

Differential Revision: https://reviews.llvm.org/D81766
2020-08-09 09:05:06 -04:00
Florian Hahn c70f0b9d4a [SCEVExpander] Avoid re-using existing casts if it means updating users.
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.

This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.

This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.

This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.

Reviewed By: efriedma, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84399
2020-08-09 13:25:17 +01:00
Roman Lebedev e492f0e03b
[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev c2ebb32465
[NFC][SimplifyCFG] Add a test showing invoke->call simplification failure 2020-08-08 20:00:28 +03:00
Juneyoung Lee b6d9add71b [InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y)
This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y)
to x or y.
This was needed to resolve slowdown after D84940 is applied.

I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear.
This patch conservatively writes the pattern in a separate function,
foldSelectWithFrozenICmp.

The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85533
2020-08-08 15:22:29 +09:00
Juneyoung Lee 595d3b5ecc [InstCombine] Add tests for select(freeze(icmp x, y), x, y); NFC 2020-08-08 15:09:08 +09:00
Simon Pilgrim f35992b75b [SLP][X86] Add smax intrinsic reduction tests
SLP currently only matches the ICMP+SELECT patterns for min/max reductions
2020-08-07 11:48:08 +01:00
Simon Pilgrim aa38e97ad5 [SLP][X86] Add abs/smax/smin/umax/umin intrinsic vectorization tests 2020-08-07 11:23:43 +01:00
Shinji Okumura c575ba28de [Attributor] AAPotentialValues Interface
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83283
2020-08-07 17:35:12 +09:00
Max Kazantsev 9b49a4d301 [Test] Add one more test on IndVars that was failing on one of older builds 2020-08-07 14:23:55 +07:00
Shinji Okumura f13f2e16f0 [Attributor] Check violation of returned position nonnull and noundef attribute in AAUndefinedBehavior
This patch is a follow up of D84733.
If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85178
2020-08-07 12:02:42 +09:00
Arthur Eubanks 039fb7f68a [NewPM][GuardWidening] Fix loop guard widening tests under NPM
Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D85394
2020-08-06 15:32:59 -07:00
Roman Lebedev be02adfad7
[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2)
Negator knows how to do this, but the one-use reasoning is getting
a bit muddy here, we don't really want to increase instruction count,
so we need to both lie that "IsNegation" and have an one-use check
on the outermost LHS value.
2020-08-06 23:40:16 +03:00
Roman Lebedev 0c1c756a31
[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold
Multiplication is commutative, and either of operands can be negative,
so if the RHS is a negated power-of-two, we should try to make it
true power-of-two (which will allow us to turn it into a left-shift),
by trying to sink the negation down into LHS op.

But, we shouldn't re-invent the logic for sinking negation,
let's just use Negator for that.

Tests and original patch by: Simon Pilgrim @RKSimon!

Differential Revision: https://reviews.llvm.org/D85446
2020-08-06 23:39:53 +03:00
Roman Lebedev a404acb86a
[NFC][InstCombine] Add some more tests for negation sinking into mul 2020-08-06 23:37:17 +03:00
Roman Lebedev 7ce76b06ec
[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C)
While that does increases instruction count,
shift is obviously better than a division.

Name: base
Pre: (1<<C1) >= 0
%o0 = shl i8 1, C1
%r = sdiv exact i8 C0, %o0
  =>
%r = ashr exact i8 C0, C1

Name: neg
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0
  =>
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0

Name: reverse
Pre: C1 != 0 && C1 u< 8
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0
  =>
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0

https://rise4fun.com/Alive/MRplf
2020-08-06 23:37:16 +03:00
Roman Lebedev 442cb88f53
[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle non-splat vectors 2020-08-06 23:37:15 +03:00
Roman Lebedev 8633a0d985
[NFC][InstCombine] Better tests for x s/EXACT (1 << y) pattern 2020-08-06 23:37:15 +03:00
Roman Lebedev 1c21635c94
[NFC][InstCombine] Tests for x s/EXACT (-1 << y) pattern 2020-08-06 23:37:15 +03:00
Sanjay Patel 250a167c41 [InstSimplify] avoid crashing by trying to rem-by-zero
Bug was noted in the post-commit comments for:
rGe8760bb9a8a3
2020-08-06 16:06:31 -04:00
Sanjay Patel c9bcc237a2 [VectorCombine] add tests for load+insert; NFC 2020-08-06 15:45:02 -04:00
Anton Afanasyev a7478fab6c [SLP] Fix order of `insertelement`/`insertvalue` seed operands
Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.

Fixes llvm.org/pr44067

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83779
2020-08-06 22:09:24 +03:00
Arthur Eubanks d0acd97c68 [NewPM][LoopUnswitch] Pin loop-unswitch to legacy PM or use simple-loop-unswitch
As mentioned in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143395.html,
loop-unswitch has not been ported to the NPM. Instead people are using
simple-loop-unswitch.

Pin all tests in Transforms/LoopUnswitch to legacy PM and replace all
other uses of loop-unswitch with simple-loop-unswitch.

One test that didn't fit into the above was
2014-06-21-congruent-constant.ll which seems to only pass with
loop-unswitch. That is also pinned to legacy PM.

Now all tests containing "-loop-unswitch" anywhere in the test succeed with
NPM turned on by default.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85360
2020-08-06 10:56:00 -07:00
Arthur Eubanks 5bb6b8250a [NewPM] Pin -assumption-cache-tracker tests to legacy PM
All tests have corresponding NPM RUN lines.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85395
2020-08-06 10:38:03 -07:00
Simon Pilgrim 3b93464dcf [SLP][X86] Regenerate sdiv test noticed in D83779. NFC. 2020-08-06 18:00:21 +01:00
Simon Pilgrim 8f5b2cb828 [InstCombine] Add tests for mul(add(x,c),negpow2) -> mul(sub(-c,x),pow2) fold
Also fix some undef vector elements in the similar vector tests that I missed.
2020-08-06 17:13:28 +01:00
Mircea Trofin 87fb7aa137 [llvm][MLInliner] Don't log 'mandatory' events
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.

Fixed the code, also expanded the test to capture this.

Differential Revision: https://reviews.llvm.org/D85373
2020-08-06 09:04:15 -07:00
Simon Pilgrim d1a91d947f [InstCombine] Add tests for mul(sub(x,y),negpow2) -> mul(sub(y,x),pow2) fold
Add full vector coverage (that currently are not folded).
2020-08-06 16:31:57 +01:00
Sanjay Patel 60f2c6a94c [PatternMatch] allow intrinsic form of min/max with existing matchers
I skimmed the existing users of these matchers and don't see any problems
(eg, the caller assumes the matched value was a select instruction without checking).

So I think we can generalize the matching to allow the new intrinsics or the cmp+select idioms.

I did not find any unit tests for the matchers, so added some basics there. The instsimplify
tests are adapted from existing tests for the cmp+select pattern and cover the folds in
simplifyICmpWithMinMax().

Differential Revision: https://reviews.llvm.org/D85230
2020-08-06 10:50:24 -04:00
Juneyoung Lee c771087161 [InstCombine] Fold freeze(undef) into a proper constant
This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84948
2020-08-06 18:40:04 +09:00
Juneyoung Lee 54a1097b83 [InstCombine] Add tests for D84948; NFC 2020-08-06 18:23:45 +09:00
David Green 745bf6cf44 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-06 10:10:50 +01:00
Roman Lebedev 141357663e
[InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480)
Name: (-x) u<= x  -->  x s<= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ule i8 %neg_x, %x
  =>
%r = icmp sle i8 %x, 0

https://rise4fun.com/Alive/V22

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev 132be1f502
[InstCombine] (-NSW x) u< x --> x s< 0 (PR39480)
Name: (-x) u< x  -->  x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ult i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

https://rise4fun.com/Alive/zSuf

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev 0e1241a3c9
[InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480)
Name: (-x) u>= x  -->  x s>= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp uge i8 %neg_x, %x
  =>
%r = icmp sge i8 %x, 0

https://rise4fun.com/Alive/LLHd

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00