Commit Graph

15430 Commits

Author SHA1 Message Date
Nikita Popov a0b5496026 [PredicateInfo] Add test for multiple branches on same condition (NFC)
This illustrates a case where RenamedOp does not correspond to the
value used in the condition, which it ideally should.
2020-07-10 20:59:03 +02:00
Craig Topper 1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
This matches the recent change to InstSimplify from D83440.

Differential Revision: https://reviews.llvm.org/D83535
2020-07-10 10:42:25 -07:00
Roman Lebedev 1d542f0ca8
Revert "[OpenMPOpt] ICV Tracking"
There appears to be some kind of memory corruption/use-after-free/etc
going on here. In particular, in `OpenMPOpt::deleteParallelRegions()`,
in `DeleteCallCB()`, `CI` is garbage.

WIll post reproducer in the original review.

This reverts commit 6c4a5e9257.
2020-07-10 19:00:15 +03:00
Johannes Doerfert 43d8d59d6d [Attributor][NFC] Update tests after recent changes
Attributor tests are mostly updated using the auto upgrade scripts but
sometimes we forget. If we do it manually or continue using old check
lines that still match we see unrelated changes down the line. This is
just a cleanup.
2020-07-10 10:39:32 -05:00
Roman Lebedev 2655a70a04
[InstCombine] After merging store into successor, queue prev. store to be visited (PR46661)
We can happen to have a situation with many stores eligible for transform,
but due to our visitation order (top to bottom), when we have processed
the first eligible instruction, we would not try to reprocess the previous
instructions that are now also eligible.

So after we've successfully merged a store that was second-to-last instruction
into successor, if the now-second-to-last instruction is also a such store
that is eligible, add it to worklist to be revisited.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46661
2020-07-10 17:49:16 +03:00
Roman Lebedev ef0ecb7b03
[NFCI][InstCombine] PR46661: multiple stores eligible for merging into successor - worklist issue
The testcase should pass with a single instcombine iteration.
2020-07-10 17:49:16 +03:00
Florian Hahn 264ab1e2c8 [LV] Pick vector loop body as insert point for SCEV expansion.
Currently the DomTree is not kept up to date for additional blocks
generated in the vector loop, for example when vectorizing with
predication. SCEVExpander relies on dominance checks when looking for
existing instructions to re-use and in some cases that can lead to the
expander picking instructions that do not actually dominate their insert
point (e.g. as in PR46525).

Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG
is only patched up after generating code for a block. For now, we can
just use the vector loop header, as this ensures the inserted
instructions dominate all uses in the vector loop. There should be no
noticeable impact on the generated code, as other passes should sink
those instructions, if profitable.

Fixes PR46525.

Reviewers: Ayal, gilr, mkazantsev, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D83288
2020-07-10 10:37:12 +01:00
Diogo Sampaio 7bf168390f [BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses
Summary: This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value.

Reviewers: dmgreen, eli.friedman, spatel, lebedev.ri, nikic

Reviewed By: lebedev.ri, nikic

Subscribers: hiraditya, dmgreen, craig.topper, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60413
2020-07-10 08:34:53 +01:00
Chen Zheng f1efb8bb4b [SCEV][IndVarSimplify] insert point should not be block front.
The block front may be a PHI node, inserting a cast instructions like
BitCast, PtrToInt, IntToPtr among PHIs is not right.

Reviewed By: lebedev.ri

Differential Revision:  https://reviews.llvm.org/D80975
2020-07-09 21:56:57 -04:00
Nikita Popov c0308fd154 [PredicateInfo] Print RenamedOp (NFC)
Make it easier to debug renaming issues.
2020-07-09 23:14:24 +02:00
Craig Topper 469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
Follow up from the transform being removed in D83360. If X is probably not poison, then the transform is safe.

Still plan to remove or adjust the code from ConstantFolding after this.

Differential Revision: https://reviews.llvm.org/D83440
2020-07-09 12:21:03 -07:00
Craig Topper 122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
We can't fold to the non-undef value unless we know it isn't poison. So check each element with isGuaranteedNotToBeUndefOrPoison. This currently rules out all constant expressions.

Differential Revision: https://reviews.llvm.org/D83442
2020-07-09 11:01:12 -07:00
Florian Hahn 9477d39e61 [SCCP] Move tests using only ipsccp from IPConstantProp to SCCP (NFC).
Some of the tests in the llvm/test/Transforms/IPConstantProp directory
actually only use -ipsccp. Those tests belong to the other (IP)SCCP
tests in llvm/test/Transforms/SCCP/ and this commits moves them there to
avoid confusion with IPConstantProp.
2020-07-09 17:16:15 +01:00
Diogo Sampaio a0e981c190 [NFC] Add SExt multiuses test 2020-07-09 15:31:16 +01:00
dfukalov 167767a775 SpeculativeExecution: Fix for logic change introduced in D81730.
Summary:
The test case started to hoist bitcasts to upper BB after D81730.
Reverted unintentional logic change. Some instructions may have zero cost but
will not be hoisted by different limitation so should be counted for threshold.

Reviewers: aprantl, arsenm, nhaehnle

Reviewed By: aprantl

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82761
2020-07-09 15:45:23 +03:00
Florian Hahn a86ce06faf [SCCP] Use conditional info with AND/OR branch conditions.
Currently SCCP does not combine the information of conditions joined by
AND in the true branch or OR in the false branch.

For branches on AND, 2 copies will be inserted for the true branch, with
one being the operand of the other as in the code below. We can combine
the information using intersection. Note that for the OR case, the
copies are inserted in the false branch, where using intersection is
safe as well.

    define void @foo(i32 %a) {
    entry:
      %lt = icmp ult i32 %a, 100
      %gt = icmp ugt i32 %a, 20
      %and = and i1 %lt, %gt
    ; Has predicate info
    ; branch predicate info { TrueEdge: 1 Comparison:  %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] }
      %a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a)
    ; Has predicate info
    ; branch predicate info { TrueEdge: 1 Comparison:  %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] }
      %a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0)
      br i1 %and, label %true, label %false

    true:                                             ; preds = %entry
      call void @use(i32 %a.1)
      %true.1 = icmp ne i32 %a.1, 20
      call void @use.i1(i1 %true.1)
      ret void

    false:                                            ; preds = %entry
      call void @use(i32 %a.1)
      ret void
    }

Reviewers: efriedma, davide, mssimpso, nikic

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D77808
2020-07-09 12:59:24 +01:00
Paul Walker 6b403319f8 [SVE] Scalarize fixed length masked loads and stores.
When adding support for scalable vector masked loads and stores we
accidently opened up likewise for fixed length vectors. This patch
restricts support to scalable vectors only, thus ensuring fixed
length vectors are treated the same regardless of SVE support.

Differential Revision: https://reviews.llvm.org/D83341
2020-07-09 10:47:04 +00:00
Jun Ma f0bfad2ed9 [Coroutines] Refactor sinkLifetimeStartMarkers
Differential Revision: https://reviews.llvm.org/D83379
2020-07-09 18:23:28 +08:00
Dmitry Polukhin 9e7fddbd36 [yaml][clang-tidy] Fix multiline YAML serialization
Summary:
New line duplication logic introduced in https://reviews.llvm.org/D63482
has two issues: (1) there is no logic that removes duplicate newlines
when clang-apply-replacment reads YAML and (2) in general such logic
should be applied to all strings and should happen on string
serialization level instead in YAML parser.

This diff changes multiline strings quotation from single quote `'` to
double `"`. It solves problems with internal newlines because now they are
escaped. Also double quotation solves the problem with leading whitespace after
newline. In case of single quotation YAML parsers should remove leading
whitespace according to specification. In case of double quotation these
leading are internal space and they are preserved. There is no way to
instruct YAML parsers to preserve leading whitespaces after newline so
double quotation is the only viable option that solves all problems at
once.

Test Plan: check-all

Reviewers: gribozavr, mgehre, yvvan

Subscribers: xazax.hun, hiraditya, cfe-commits, llvm-commits

Tags: #clang-tools-extra, #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80301
2020-07-09 02:41:58 -07:00
Craig Topper ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
Part of addressing post-commit feedback from D83360
2020-07-08 15:24:55 -07:00
Craig Topper 9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
As noted here https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html and by alive2, this transform isn't valid. If X is poison this potentially propagates poison when it shouldn't.

This same transform still exists in DAGCombiner.

Differential Revision: https://reviews.llvm.org/D83360
2020-07-08 12:53:05 -07:00
Nikita Popov a48cf72238 [InstSimplify] Handle not inserted instruction gracefully (PR46638)
When simplifying comparisons using a dominating assume, bail out
if the context instruction is not inserted.
2020-07-08 21:43:32 +02:00
Wei Mi e32469a140 [SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge-inlinee
by default.

sample-profile-top-down-load is an internal option which can enable top-down
order of inlining and profile annotation in sample profile load pass. It was
found to be beneficial for better profile annotation.

Recently we found it could also solve some build time issue. Suppose function
A has many callsites in function B. In the last release binary where sample
profile was collected, the outline copy of A is large because there are many
other functions inlined into A. However although all the callsites calling A
in B are inlined, but every inlined body is small (A was inlined into B
before other functions are inlined into A), there is no build time issue in
last release.

In an optimized build using the sample profile collected from last release,
without top-down inlining, we saw a case that A got very large because of
inlining, and then multiple callsites of A got inlined into B, and that led
to a huge B which caused significant build time issue besides profile
annotation issue.

To solve that problem, the patch enables the flag
sample-profile-top-down-load by default. sample-profile-top-down-load can
have better performance when it is enabled together with
sample-profile-merge-inlinee so in this patch we also enable
sample-profile-merge-inlinee by default.

Differential Revision: https://reviews.llvm.org/D82919
2020-07-08 09:23:18 -07:00
Arthur Eubanks 481709e831 [NewPM][opt] Share -disable-loop-unrolling between pass managers
There's no reason to introduce a new option for the NPM.
The various PGO options are shared in this manner.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D83368
2020-07-08 08:50:56 -07:00
Stanislav Mekhanoshin 64030099c3 SLP: honor requested max vector size merging PHIs
At the moment this place does not check maximum size set
by TTI and just creates a maximum possible vectors.

Differential Revision: https://reviews.llvm.org/D82227
2020-07-08 08:06:15 -07:00
Florian Hahn 80970ac875 [DSE,MSSA] Eliminate stores by terminators (free,lifetime.end).
This patch adds support for eliminating stores by free & lifetime.end
calls. We can remove stores that are not read before calling a memory
terminator and we can eliminate all stores after a memory terminator
until we see a new lifetime.start. The second case seems to not really
trigger much in practice though.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72410
2020-07-08 08:59:46 +01:00
Florian Hahn 04b85e2bcb Revert "[SLP] Make sure instructions are ordered when computing spill cost."
This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371

This reverts commit eb46137daa.
2020-07-07 23:15:01 +01:00
Christopher Tetreault 021d56abb9 [SVE] Make Constant::getSplatValue work for scalable vector splats
Summary:
Make Constant::getSplatValue recognize scalable vector splats of the
form created by ConstantVector::getSplat. Add unit test to verify that
C == ConstantVector::getSplat(C)->getSplatValue() for fixed width and
scalable vector splats

Reviewers: efriedma, spatel, fpetrogalli, c-rhodes

Reviewed By: efriedma

Subscribers: sdesmalen, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82416
2020-07-07 13:45:51 -07:00
Arthur Eubanks 2279380eab [Inliner] Don't skip inlining alwaysinline in optnone functions
Previously the NPM inliner would skip all potential inlines in an
optnone function, but alwaysinline callees should be inlined regardless
of optnone.

Fixes inline-optnone.ll under NPM.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D83021
2020-07-07 12:54:55 -07:00
Nikita Popov 8691544a27 [SCCP] Use range metadata for loads and calls
When all else fails, use range metadata to constrain the result
of loads and calls. It should also be possible to use !nonnull,
but that would require some general support for inequalities in
SCCP first.

Differential Revision: https://reviews.llvm.org/D83179
2020-07-07 21:09:21 +02:00
Nikita Popov 9dfea03517 [SCCP] Handle assume predicates
Take assume predicates into account when visiting ssa.copy. The
handling is the same as for branch predicates, with the difference
that we're always on the true edge.

Differential Revision: https://reviews.llvm.org/D83257
2020-07-07 20:22:52 +02:00
Hans Wennborg 7fc279ca3d [GlobalOpt] Don't remove inalloca from musttail-called functions
Otherwise the verifier complains about the mismatching function ABIs.

Differential revision: https://reviews.llvm.org/D83300
2020-07-07 19:02:46 +02:00
Roman Lebedev 16266e6396
[Scalarizer] When gathering scattered scalar, don't replace it with itself
The (previously-crashing) test-case would cause us to seemingly-harmlessly
replace some use with something else, but we can't replace it with itself,
so we would crash.
2020-07-07 17:03:53 +03:00
Ayal Zaks 7bf299c8d8 [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz
If a loop is in a function marked OptSize, Loop Access Analysis should refrain
from generating runtime checks for unit strides that will version the loop.

If a loop is in a function marked OptSize and its vectorization is enabled, it
should be vectorized w/o any versioning.

Fixes PR46228.

Differential Revision: https://reviews.llvm.org/D81345
2020-07-07 15:04:21 +03:00
Max Kazantsev 094e99d264 [Test] Add one more missing optimization opportunity test 2020-07-07 13:04:15 +07:00
Jordan Rupprecht 10c82eecbc Revert "[LV] Enable the LoopVectorizer to create pointer inductions"
This reverts commit a8fe12065e.

It causes a crash when building gzip. Will post the detailed reduced test case to D81267.
2020-07-06 17:50:38 -07:00
Roman Lebedev db05f2e34a
[Scalarizer] Centralize instruction DCE
As reported in https://reviews.llvm.org/D83101#2133062
the new visitInsertElementInst()/visitExtractElementInst() functionality
is causing miscompiles (previously-crashing test added)

It is due to the fact how the infra of Scalarizer is dealing with DCE,
it was not updated or was it ready for such scalar value forwarding.
It always assumed that the moment we "scalarized" something,
it can go away, and did so with prejudice.

But that is no longer safe/okay to do.

Instead, let's prevent it from ever shooting itself into foot,
and let's just accumulate the instructions-to-be-deleted
in a vector, and collectively cleanup (those that are *actually* dead)
them all at the end.

All existing tests are not reporting any new garbage leftovers,
but maybe it's test coverage issue.
2020-07-07 01:12:51 +03:00
David Green 146dad0077 [ARM] MVE FP16 cost adjustments
This adjusts the MVE fp16 cost model, similar to how we already do for
integer casts. It uses the base cost of 1 per cvt for most fp extend /
truncates, but adjusts it for loads and stores where we know that a
extending load has been used to get the load into the correct lane, and
only an MVE VCVTB is then needed.

Differential Revision: https://reviews.llvm.org/D81813
2020-07-06 15:57:51 +01:00
Roman Lebedev 51f9310ff2
[Scalarizer] ExtractElement handling w/ variable insert index (PR46524)
Summary:
Similar to D82961.

Reviewers: bjope, cameron.mcinally, arsenm, jdoerfert

Reviewed By: jdoerfert

Subscribers: arphaman, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82970
2020-07-06 13:19:33 +03:00
Roman Lebedev 6e50474581
[Scalarizer] InsertElement handling w/ variable insert index (PR46524)
Summary:
I'm interested in taking the original C++ input,
for which we currently are stuck with an alloca
and producing roughly the lower IR,
with neither an alloca nor a vector ops:
https://godbolt.org/z/cRRWaJ

For that, as intermediate step, i'd to somehow perform scalarization.
As per @arsenmn suggestion, i'm trying to see if scalarizer can help me
avoid writing a bicycle.

I'm not sure if it's really intentional that variable insert is not handled currently.
If it really is, and is supposed to stay that way (?), i guess i could guard it..

See [[ https://bugs.llvm.org/show_bug.cgi?id=46524 | PR46524 ]].

Reviewers: bjope, cameron.mcinally, arsenm, jdoerfert

Reviewed By: jdoerfert

Subscribers: arphaman, uabelho, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82961
2020-07-06 13:19:32 +03:00
Roman Lebedev 28b7816b78
[Scalarizer] ExtractElement handling w/ constant extract index
Summary:
It appears to be better IR-wise to aggressively scalarize it,
rather than relying on gathering it, and leaving it as-is.

Reviewers: jdoerfert, bjope, arsenm, cameron.mcinally

Reviewed By: jdoerfert

Subscribers: arphaman, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83101
2020-07-06 13:19:32 +03:00
Roman Lebedev f62c8dbc99
[Scalarizer] InsertElement handling w/ constant insert index
Summary: As it can be clearly seen from the diff, this results in nicer IR.

Reviewers: jdoerfert, arsenm, bjope, cameron.mcinally

Reviewed By: jdoerfert

Subscribers: arphaman, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83102
2020-07-06 13:19:32 +03:00
David Green 55227f85d0 [ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost
This alters getMemoryOpCost to use the Base TargetTransformInfo version
that includes some additional checks for whether extending loads are
legal. This will generally have the effect of making <2 x ..> and some
<4 x ..> loads/stores more expensive, which in turn should help favour
larger vector factors.

Notably it alters the cost of a <4 x half>, which with the current
codegen will be expensive if it is not extended.

Differential Revision: https://reviews.llvm.org/D82456
2020-07-06 10:58:40 +01:00
Nikita Popov 516ff1d4ba [SCCP] Add test for range metadata (NFC) 2020-07-05 21:41:04 +02:00
sstefan1 6c4a5e9257 [OpenMPOpt] ICV Tracking
This is the first and most basic ICV Tracking implementation. For this
first version, we only support deduplication within the same BB.

Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku,
baziotis

Differential Revision: https://reviews.llvm.org/D81788
2020-07-04 23:31:50 +02:00
Roman Lebedev 7ea46aee36
Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions"
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.

Minimal reduced reproducer: run `opt -alignment-from-assumptions` on

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }

; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
  call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
  ret i32 0
}

; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1

attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }


This is what we'd have with -mllvm -enable-knowledge-retention

This reverts commit c95ffadb24.
2020-07-04 23:49:23 +03:00
Roman Lebedev 11a3f040c7
[Utils] Make -assume-builder/-assume-simplify actually work on Old-PM
clang w/ old-pm currently would simply crash
when -mllvm  -enable-knowledge-retention=true is specified.

Clearly, these two passes had no Old-PM test coverage,
which would have shown the problem - not requiring AssumptionCacheTracker,
but then trying to always get it.

Also, why try to get domtree only if it's cached,
but at the same time marking it as required?
2020-07-04 21:06:36 +03:00
Sanjay Patel 3b8ae1001f [InstCombine] fix miscompile from umul_with_overflow matching
As noted in PR46561:
https://bugs.llvm.org/show_bug.cgi?id=46561
...it takes something beyond a minimal IR example to trigger
this bug because it relies on matching non-canonical IR.

There are no tests that show the need for matching this
pattern, so I'm just deleting it to fix the miscompile.
2020-07-04 11:16:23 -04:00
Roman Lebedev c3b8bd1eea
[InstCombine] Always try to invert non-canonical predicate of an icmp
Summary:
The actual transform i was going after was:
https://rise4fun.com/Alive/Tp9H
```
Name: zz
Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0
%t0 = and i8 %x, C0
%r = icmp eq i8 %t0, C1
  =>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1

Name: zz
Pre: isPowerOf2(C0)
%t0 = and i8 %x, C0
%r = icmp ne i8 %t0, 0
  =>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1
```
but as it can be seen from the current tests, we already canonicalize most of it,
and we are only missing handling multi-use non-canonical icmp predicates.

If we have both `!=0` and `==0`, even though we can CSE them,
we end up being stuck with them. We should canonicalize to the `==0`.

I believe this is one of the cleanup steps i'll need after `-scalarizer`
if i end up proceeding with my WIP alloca promotion helper pass.

Reviewers: spatel, jdoerfert, nikic

Reviewed By: nikic

Subscribers: zzheng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83139
2020-07-04 18:12:04 +03:00
Sanjay Patel ef70cc9d1a [InstCombine] improve debug value names; NFC
The use of 'tmp' can trigger warnings from the update_test_checks.py
script. That's evidence of a flaw in the script's logic, but we
can always do better than naming variables 'tmp' in LLVM too.

The phi test file should be updated with auto-generated regex CHECK
lines, so it isn't affected by cosmetic diffs, but I don't have
time to do that right now.
2020-07-04 11:06:30 -04:00
Sanjay Patel 14936e01e2 [InstCombine] add test for miscompile (PR46561); NFC 2020-07-04 11:06:30 -04:00
Nikita Popov 3b671022e4 [InstSimplify] Simplify comparison between zext(x) and sext(x)
This is picking up a loose thread from D69006: We can simplify
(zext x) ule (sext x) and (zext x) sge (sext x) to true, with
various permutations. Oddly, SCEV knows about this identity,
but nothing on the IR level does.

Differential Revision: https://reviews.llvm.org/D83081
2020-07-04 11:03:00 +02:00
Nikita Popov 93ccb8eb52 [InstSimplify] Add additional zext/sext comparison tests (NFC)
Add vector variants, and negative tests where the operand does
not match.
2020-07-04 11:03:00 +02:00
Francis Visoiu Mistrih aa5ec34e31 [LoopDeletion] Emit a remark when a dead loop is deleted
This emits a remark when LoopDeletion deletes a dead loop, using the
source location of the loop's header. There are currently two reasons
for removing the loop: invariant loop or loop that never executes.

Differential Revision: https://reviews.llvm.org/D83113
2020-07-03 15:20:23 -07:00
Roman Lebedev 17a15c32af
[NFCI][LoopUnroll] s/%tmp/%i/ in one test to silence update script warning 2020-07-04 00:39:36 +03:00
Roman Lebedev 341ab51149
[NFCI][InstCombine] shift.ll: s/%tmp/%i/ to silence update script warning 2020-07-04 00:39:35 +03:00
Sanjay Patel 7fd8af1de0 [InstCombine] fold mul of sext bools to 'and'
Alive2:
  define i32 @src(i1 %x, i1 %y) {
  %0:
  %zx = sext i1 %x to i32
  %zy = sext i1 %y to i32
  %r = mul i32 %zx, %zy
  ret i32 %r
  }
  =>
  define i32 @tgt(i1 %x, i1 %y) {
  %0:
  %a = and i1 %x, %y
  %r = zext i1 %a to i32
  ret i32 %r
  }
  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/gaPQxA
2020-07-03 17:28:40 -04:00
Sanjay Patel 5504d8b04a [InstCombine] add more tests for mul of bools; NFC 2020-07-03 17:28:22 -04:00
Florian Hahn 31971ca1c6 [InstCombine] Try to narrow expr if trunc cannot be removed.
Narrowing an input expression of a truncate to a type larger than the
result of the truncate won't allow removing the truncate, but it may
enable further optimizations, e.g. allowing for larger vectorization
factors.

For now this is intentionally limited to integer types only, to avoid
producing new vector ops that might not be suitable for the target.

If we know that the only user is a trunc, we can also be allow more
cases, e.g. also shortening expressions with some additional shifts.

I would appreciate feedback on the best place to do such a narrowing.

This fixes PR43580.

Reviewers: spatel, RKSimon, lebedev.ri, xbolva00

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D82973
2020-07-03 20:22:51 +01:00
Sanjay Patel 40fcc42498 [InstCombine] fold mul of zext bools to 'and'
The base case only works because we are relying on a
poison-unsafe select transform; if that is fixed, we
would regress on patterns like this.

The extra use tests show that the select transform can't
be applied consistently. So it may be a regression to have
an extra instruction on 1 test, but that result was not
created safely and does not happen reliably.
2020-07-03 13:14:18 -04:00
Sanjay Patel 5d60377864 [InstCombine] add tests for mul of bools; NFC 2020-07-03 13:14:18 -04:00
Roman Lebedev 4dd784000e
[NFC][InstCombine] Add some more tests for select based on non-canonical bit-test 2020-07-03 20:12:46 +03:00
Nikita Popov cf1d9f9f49 [InstSimplify] Fold icmp with dominating assume
If we assume(x > y), then we should be able to fold the basic
implications of that, like x >= y. This already happens if either
one of the operands is constant (LVI) or if the conditions are
exactly the same (GVN), but not if we have an implication with
non-constant operands. Support this by querying AssumptionCache.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40149.

Differential Revision: https://reviews.llvm.org/D82717
2020-07-03 18:53:58 +02:00
Florian Hahn eb46137daa [SLP] Make sure instructions are ordered when computing spill cost.
The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.

The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82444
2020-07-03 17:30:17 +01:00
Florian Hahn 039145c72b [SLP] Precommit test for which spill cost is computed incorrectly.
Test for D82444.
2020-07-03 17:15:52 +01:00
Florian Hahn 7a1161767b [InstCombine] Precommit tests for PR43580. 2020-07-03 17:14:02 +01:00
Sanjay Patel 63774642af [InstCombine] add one-use check to cast+select narrowing transform
Prevent increasing the instruction count.
2020-07-03 11:54:09 -04:00
Sanjay Patel 0cd0ae1f29 [InstCombine] add tests to show missing one-use checks; NFC 2020-07-03 11:54:09 -04:00
Simon Pilgrim eb0e7acbd4 [InstCombine] canEvaluateTruncated - use KnownBits to check for inrange shift amounts
Currently canEvaluateTruncated can only attempt to truncate shifts if they are scalar/uniform constant amounts that are in range.

This patch replaces the constant extraction code with KnownBits handling, using the KnownBits::getMaxValue to check that the amounts are inrange.

This enables support for nonuniform constant cases, and also variable shift amounts that have been masked somehow. Annoyingly, this still won't work for vectors with (demanded) undefs as KnownBits returns nothing in those cases, but its a definite improvement on what we currently have.

Differential Revision: https://reviews.llvm.org/D83127
2020-07-03 16:02:10 +01:00
Sam Parker 18850981c8 [NFC][SimplifyCFG] Move X86 tests into subdir 2020-07-03 14:28:27 +01:00
Simon Pilgrim 1ab88de0ed Add tests for trunc(shl/lshr/ashr(*ext(x),zext(and(y,c)))) patterns with variable shifts with clamped shift amounts 2020-07-03 13:39:16 +01:00
Simon Pilgrim b18405fbc0 Add vector trunc(or(shl(zext(x),c1),zext(x))) tests 2020-07-03 13:32:00 +01:00
Simon Pilgrim 80d4f33479 Regenerate apint-cast tests and replace %tmp variable names to silence update_test_checks warnings 2020-07-03 11:42:16 +01:00
Simon Pilgrim b3a2882dbc Add nonuniform vector trunc(or(shl(zext(x),c1),srl(zext(x),c2))) tests 2020-07-03 11:42:15 +01:00
Simon Pilgrim 029046dc32 Regenerate mul-trunc tests, add vector variants and replace %tmp variable names to silence update_test_checks warnings 2020-07-03 11:42:15 +01:00
Simon Pilgrim 3da42f4810 [InstCombine] Add sext(ashr(shl(trunc(x),c),c)) folding support for vectors
Replacing m_ConstantInt with m_Constant permits folding of vectors as well as scalars.

Differential Revision: https://reviews.llvm.org/D83058
2020-07-03 10:04:37 +01:00
Simon Pilgrim 76673c65e7 Regenerate PR19420 tests 2020-07-03 10:04:37 +01:00
Sam Parker 0724153bbe [CostModel] Fix cast crash
Don't presume instruction operands while matching reductions.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46430

Differential Revision: https://reviews.llvm.org/D82453
2020-07-03 07:53:45 +01:00
Roman Lebedev e98030a55f
[NFC][Scalarizer] Also scalarize loads in newly-added tests
Should help better showcase improvements
2020-07-03 02:37:29 +03:00
Roman Lebedev 739c7a0a04
[NFC][Scalarizer] Add some insertelement/extractelement tests
See D82961/D82970/D83101/D83102.
2020-07-03 02:04:47 +03:00
Nikita Popov 359345d609 [InstSimplify] Add test for sext/zext comparisons (NFC) 2020-07-02 22:21:59 +02:00
Arthur Eubanks 0059f6ffe8 [NewPM] Add -basic-aa to pr33196.ll
The legacy pass manager implicitly adds BasicAA, but the new PM does
not. This causes pr33196.ll to fail under NPM.

There are almost certainly lots of other failures like this, wanted to
get some input on if adding -basic-aa to tests makes sense at scale.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D82915
2020-07-02 11:27:52 -07:00
Arthur Eubanks 3d12e79094 [NewPM][LSR] Rename strength-reduce -> loop-reduce
The legacy pass was called "loop-reduce".

This lowers the number of check-llvm failures under NPM by 83.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82925
2020-07-02 11:15:29 -07:00
Simon Pilgrim 50b25e0679 [InstCombine] Add some sext/trunc tests to show missing support for non-uniform vectors 2020-07-02 17:11:56 +01:00
Simon Pilgrim 769b979930 [InstCombine] Add (vXi1 trunc(lshr(x,c))) -> icmp_eq(and(x,c')) support for non-uniform vectors
As noted on PR46531, we were only performing this transform on uniform vectors as we were using the m_APInt pattern matcher to extract the shift amount.

Differential Revision: https://reviews.llvm.org/D83035
2020-07-02 16:56:33 +01:00
Simon Pilgrim 103d62e131 [InstCombine] Add some (vXi1 trunc(lshr(x,c))) -> icmp_eq(and(x,c')) tests for vectors with undef elements
Suggested on D83035
2020-07-02 16:04:30 +01:00
Simon Pilgrim 23eeae5526 Regenerate sext/trunc tests and replace %tmp variable names to silence update_test_checks warnings 2020-07-02 14:37:21 +01:00
Simon Pilgrim 421c02e5c6 [InstCombine] Add some (vXi1 trunc(lshr(x,c))) -> icmp_eq(and(x,c')) tests for non-uniform vectors
As noticed on PR46531
2020-07-02 11:56:51 +01:00
Simon Pilgrim 11c4bb0c7c Regenerate apint-shift tests and replace %tmp variable names to silence update_test_checks warnings 2020-07-02 11:56:51 +01:00
Anna Welker a8fe12065e [LV] Enable the LoopVectorizer to create pointer inductions
This patch enables the LoopVectorizer to build a phi of pointer
type and provide the vector loads and stores with vector type
getelementptrs built from the pointer induction variable, which
produces much less instructions than the previous approach of
creating scalar getelementpointers and glue them together to a
vector.

Differential Revision: https://reviews.llvm.org/D81267
2020-07-02 11:39:28 +01:00
Nuno Lopes 7f903873b8 DSE: fix builtin function recognition to take decl into account 2020-07-02 10:28:47 +01:00
Nikita Popov a59dc55c2a [InstSimplify] Move assume icmp test (NFC)
Move this test from InstCombine into InstSimplify.
2020-07-01 23:35:52 +02:00
Sergey Dmitriev cb8faaacb5 [CallGraph] Add support for callback call sites
Summary:
This patch changes call graph analysis to recognize callback call sites
and add an artificial 'reference' call record from the broker function
caller to the callback function in the call graph. A presence of such
reference enforces bottom-up traversal order for callback functions in
CG SCC pass manager because callback function logically becomes a callee
of the broker function caller.

Reviewers: jdoerfert, hfinkel, sstefan1, baziotis

Reviewed By: jdoerfert

Subscribers: hiraditya, kuter, sstefan1, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82572
2020-07-01 13:44:11 -07:00
Nikita Popov 91836fd7f3 [LVI][CVP] Handle (x | y) < C style conditions
InstCombine may convert conditions like (x < C) && (y < C) into
(x | y) < C (for some C). This patch teaches LVI to recognize that
in this case, it can infer either x < C or y < C along the edge.

This fixes the issue reported at
https://github.com/rust-lang/rust/issues/73827.

Differential Revision: https://reviews.llvm.org/D82715
2020-07-01 20:43:24 +02:00
Nikita Popov 0f6afd946d [CVP] Use different number in test (NFC)
To make it clear that this is not intended to be specific to
mask / bit tests.
2020-07-01 18:43:59 +02:00
Hiroshi Yamauchi 6bd1db08e7 [InstCombine] Don't let an alignment assume prevent new/delete removals.
Remove allocations with alignment assume.

Differential Revision: https://reviews.llvm.org/D81854
2020-07-01 09:22:32 -07:00
Florian Hahn 1ccc49924a [AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default.
AArch64 did not have its own implementation, hence the throughput cost
of CFI instructions is overestimated. On most cores, most branches should
be predicated and essentially free throughput wise.

This restores a 9% performance regression on a SPEC2006 benchmark on
AArch64 with -O3 LTO & PGO.

This patch effectively restores pre 2596da3174 behavior for AArch64
and undoes the AArch64 test changes of the patch.

Reviewers: samparker, dmgreen, anemet

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D82755
2020-06-30 20:34:04 +01:00
David Green 9e49d1d9b8 [InstCombine] fma x, y, 0 -> fmul x, y
If the addend of the fma is zero, common sense would suggest that we can
convert fma x, y, 0.0 to fmul x, y. This comes up with some user code
that was expecting the first fma in an unrolled loop to simplify to a
fmul.

Floating point often does not follow naive common sense though. Alive
suggests that this should be guarded by nsz (as fadd -0.0, 0.0 = 0.0).
fma x, y, -0.0 is always valid.

Differential Revision: https://reviews.llvm.org/D82778
2020-06-30 19:56:37 +01:00
Sanjay Patel 09b8dbf70c [PhaseOrdering][NewPM] update test that silently showed bug with SpeculativeExecutionPass; NFC
See D82735 / rG1a6cebb4d12c744699e23624f8afda5cbe216fe6
2020-06-30 14:22:20 -04:00
David Green 787b1a4746 [InstCombine] New FMA tests and regenerate tests. NFC 2020-06-30 18:05:13 +01:00
Max Kazantsev f01d9e6fc3 [SimplifyCFG] Fix inconsistency in block size assessment for threading
Sometimes SimplifyCFG may decide to perform jump threading. In order
to do it, it follows the following algorithm:

1. Checks if the block is small enough for threading;
2. If yes, inserts a PR Phi relying that the next iteration will remove it
   by performing jump threading;
3. The next iteration checks the block again and performs the threading.

This logic has a corner case: inserting the PR Phi increases block's size
by 1. If the block size at first check was max possible, one more Phi will
exceed this size, and we will neither perform threading nor remove the
created Phi node. As result, we will end up with worse IR than before.

This patch fixes this situation by excluding Phis from block size computation.
Excluding Phis from size computation for threading also makes sense by
itself because in case of threadign all those Phis will be removed.

Differential Revision: https://reviews.llvm.org/D81835
Reviewed By: asbirlea, nikic
2020-06-30 12:40:07 +07:00
Matt Arsenault 7c308dc80a LowerConstantIntrinsics: Fix missing test for byval behavior 2020-06-29 14:45:31 -04:00
Nikita Popov c84a952dc7 [IndVars] Regenerate test checks (NFC) 2020-06-29 20:33:50 +02:00
Matt Arsenault 3621a520d3 Inliner: Add missing test for alignment assume with byval
No tests were stressing the behavior for hasPassPointeeByValueAttr.
2020-06-29 10:39:58 -04:00
Sanjay Patel b6315aee5b [VectorCombine] try to form vector compare and binop to eliminate scalar ops
binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0

This is a larger pattern than the existing extractelement folds because we can't
reasonably vectorize the sub-patterns with constants based on cost model calcs
(it doesn't usually make sense to replace a single extracted scalar op with
constant operand with a vector op).

I salvaged as much of the existing logic as I could, but there might be better
ways to share and reduce code.

The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that
particular pattern in D59710, but that caused crashing and regressions.
This patch is more general, but hopefully safer.

The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5

Differential Revision: https://reviews.llvm.org/D82474
2020-06-29 10:38:52 -04:00
Nikita Popov a28d38a6bc [SimplifyCFG] Make test more robust (NFC)
Avoid changing this test if blocks get merged.
2020-06-28 20:51:03 +02:00
Nikita Popov d5a482acf9 [SimplifyCFG] Regenerate test checks (NFC) 2020-06-28 20:51:02 +02:00
Xun Li c8755b6378 [Coroutines] Optimize the lifespan of temporary co_await object
Summary:
If we ever assign co_await to a temporary variable, such as foo(co_await expr),
we generate AST that looks like this: MaterializedTemporaryExpr(CoawaitExpr(...)).
MaterializedTemporaryExpr would emit an intrinsics that marks the lifetime start of the
temporary storage. However such temporary storage will not be used until co_await is ready
to write the result. Marking the lifetime start way too early causes extra storage to be
put in the coroutine frame instead of the stack.
As you can see from https://godbolt.org/z/zVx_eB, the frame generated for get_big_object2 is 12K, which contains a big_object object unnecessarily.
After this patch, the frame size for get_big_object2 is now only 8K. There are still room for improvements, in particular, GCC has a 4K frame for this function. But that's a separate problem and not addressed in this patch.

The basic idea of this patch is during CoroSplit, look for every local variable in the coroutine created through AllocaInst, identify all the lifetime start/end markers and the use of the variables, and sink the lifetime.start maker to the places as close to the first-ever use as possible.

Reviewers: lewissbaker, modocache, junparser

Reviewed By: junparser

Subscribers: hiraditya, llvm-commits, rsmith, ChuanqiXu, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82314
2020-06-28 10:18:15 -07:00
Sanjay Patel 931411136a [VectorCombine] add test for scalable vectors; NFC 2020-06-28 12:44:44 -04:00
Sanjay Patel 2f3549f813 Revert "[VectorCombine] add test for scalable vectors; NFC"
This reverts commit 700ec6b848.
An extra test diff snuck here.
2020-06-28 12:43:11 -04:00
Sanjay Patel 700ec6b848 [VectorCombine] add test for scalable vectors; NFC 2020-06-28 12:42:00 -04:00
Nikita Popov 8758e14c6f [InstCombine] Add tests for assume implication (NFC) 2020-06-28 16:18:44 +02:00
Nikita Popov 70c5d95248 [CVP] Add tests for icmp or and/or edge conds (NFC) 2020-06-28 14:54:55 +02:00
dfukalov c7bcd431d9 SpeculativeExecution: fix incorrect debug info move
Summary:
Debug info related instructions got zero cost so hoisted unconditionally

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46267

Reviewers: arsenm, nhaehnle, chandlerc, aprantl

Reviewed By: aprantl

Subscribers: ormris, uabelho, wdng, aprantl, hiraditya, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D81730
2020-06-28 14:35:00 +03:00
Roman Lebedev f0634100cd
[Analysis] isDereferenceableAndAlignedPointer(): don't crash on `bitcast <1 x ???*> to ???*` 2020-06-27 18:30:59 +03:00
Fangrui Song 4cd19a6e15 [BasicAA] Rename -disable-basicaa to -disable-basic-aa to be consistent with the canonical name "basic-aa" 2020-06-26 20:55:44 -07:00
Fangrui Song f31811f2dc [BasicAA] Rename deprecated -basicaa to -basic-aa
Follow-up to D82607
Revert an accidental change (empty.ll) of D82683
2020-06-26 20:41:37 -07:00
Arthur Eubanks 059994f219 [NewPM][BasicAA] basicaa -> basic-aa in Transforms/{New,}GVN
Summary: Following https://reviews.llvm.org/D82607.

Reviewers: ychen

Subscribers: asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82688
2020-06-26 20:28:18 -07:00
Arthur Eubanks 339eed5d0b [NewPM][BasicAA] basicaa -> basic-aa in Transforms/DeadStoreElimination
Summary: Following https://reviews.llvm.org/D82607.

Reviewers: ychen

Subscribers: george.burgess.iv, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82689
2020-06-26 20:13:37 -07:00
Vedant Kumar 9649c2095f [InstCombine] Drop debug loc in TryToSinkInstruction (reland)
Summary:
The advice in HowToUpdateDebugInfo.rst is to "... preserve the debug
location of an instruction if the instruction either remains in its
basic block, or if its basic block is folded into a predecessor that
branches unconditionally".

TryToSinkInstruction doesn't seem to satisfy the criteria as it's
sinking an instruction to some successor block. Preserving the debug loc
can make single-stepping appear to go backwards, or make a breakpoint
hit on that location happen "too late" (since single-stepping from that
breakpoint can cause the function to return unexpectedly).

So, drop the debug location.

This was reverted in ee3620643d because it removed source locations
from inlinable calls, breaking a verifier rule. I've added an exception
for calls because the alternative (setting a line 0 location) is not
better. I tested the updated patch by completing a stage2 RelWithDebInfo
build.

Reviewers: aprantl, davide

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82487
2020-06-26 17:18:15 -07:00
Vedant Kumar ee3620643d Revert "[InstCombine] Drop debug loc in TryToSinkInstruction"
This reverts commit 903cf140d0.

This might be causing verifier failures on the bots, such as: "inlinable
function call in a function with debug info must have a !dbg location"
--

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/16976/steps/bootstrap%20clang/logs/stdio
2020-06-26 14:59:40 -07:00
Arthur Eubanks 691c086d15 [NewPM][BasicAA] basicaa -> basic-aa in Transforms/SLPVectorizer
Following https://reviews.llvm.org/D82607.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D82681
2020-06-26 14:58:41 -07:00
Vedant Kumar 903cf140d0 [InstCombine] Drop debug loc in TryToSinkInstruction
Summary:
The advice in HowToUpdateDebugInfo.rst is to "... preserve the debug
location of an instruction if the instruction either remains in its
basic block, or if its basic block is folded into a predecessor that
branches unconditionally".

TryToSinkInstruction doesn't seem to satisfy the criteria as it's
sinking an instruction to some successor block. Preserving the debug loc
can make single-stepping appear to go backwards, or make a breakpoint
hit on that location happen "too late" (since single-stepping from that
breakpoint can cause the function to return unexpectedly).

So, drop the debug location.

Reviewers: aprantl, davide

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82487
2020-06-26 13:23:24 -07:00
Roman Lebedev 64258773ad
[CostModel] Avoid traditional ConstantExpr crashy pitfails
I'm not sure if this is a regression from D81448 + D81643,
which moved at least the code cast from elsewhere,
or somehow no one triggered that before.
But now we can reach it with a non-instruction..

It is not straight-forward to write cost-model tests for constantexprs,
`-cost-model -analyze -cost-kind=` does not appear to look at them,
or maybe i'm doing it wrong.

I've encountered that via a SimplifyCFG crash,
so reduced (currently-crashing) test is added.
There are likely other instances.

For now, simply restore previous status quo of
not crashing and returning TTI::TCC_Basic.
2020-06-26 22:48:10 +03:00
Rong Xu b4bceb94ee [PGO] Add a functionality to always instrument the func entry BB
Add an option to always instrument function entry BB (default off)
Add an option to do atomically updates on the first counter in each
instrumented function.

Differential Revision: https://reviews.llvm.org/D82123
2020-06-26 10:43:23 -07:00
Arthur Eubanks a95796a380 [NewPM][LoopUnroll] Rename unroll* to loop-unroll*
The legacy pass is called "loop-unroll", but in the new PM it's called "unroll".
Also applied to unroll-and-jam and unroll-full.

Fixes various check-llvm tests when NPM is turned on.

Reviewed By: Whitney, dmgreen

Differential Revision: https://reviews.llvm.org/D82590
2020-06-26 09:28:32 -07:00
David Sherwood 7a834a0a4e [SVE] Fix scalable vector bug in DataLayout::getIntPtrType
Fixed an issue in DataLayout::getIntPtrType where we were assuming
the input type was always a fixed vector type, which isn't true.

Added a test that exposed the problem to:

  Transforms/InstCombine/vector_gep1.ll

Differential Revision: https://reviews.llvm.org/D82294
2020-06-26 07:58:45 +01:00
Michael Liao dccfaacf93 [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`.
Summary:
- `ptrtoint` and `inttoptr` are defined as no-op casts if the integer
  value as the same size as the pointer value. The pair of
  `ptrtoint`/`inttoptr` is in fact a no-op cast sequence between
  different address spaces. Teach `infer-address-spaces` to handle them
  like a `bitcast`.

Reviewers: arsenm, chandlerc

Subscribers: jvesely, wdng, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81938
2020-06-25 20:46:56 -04:00
Hiroshi Yamauchi 9878996c70 Revert "[PGO] Extend the value profile buckets for mem op sizes."
This reverts commit 63a89693f0.

Due to a build failure like http://lab.llvm.org:8011/builders/sanitizer-windows/builds/65386/steps/annotate/logs/stdio
2020-06-25 11:13:49 -07:00
Kirill Naumov d48c7859fb [InlineCost] GetElementPtr with constant operands
If the GEP instruction contanins only constants as its arguments,
then it should be recognized as a constant. For now, there was
also added a flag to turn off this simplification if it causes
any regressions ("disable-gep-const-evaluation") which is off
by default. Once I gather needed data of the effectiveness of
this simplification, the flag will be deleted.

Reviewers: apilipenko, davidxl, mtrofin

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D81026
2020-06-25 18:09:51 +00:00
Hiroshi Yamauchi 63a89693f0 [PGO] Extend the value profile buckets for mem op sizes.
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).

Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.

Differential Revision: https://reviews.llvm.org/D81682
2020-06-25 10:22:56 -07:00
Sanjay Patel c9e8c9e3ea [InstCombine] fold fmul/fdiv with fabs operands
fabs(X) * fabs(Y) --> fabs(X * Y)
fabs(X) / fabs(Y) --> fabs(X / Y)

If both operands of fmul/fdiv are positive, then the result must be positive.

There's a NAN corner-case that prevents removing the more specific fold just
above this one:
fabs(X) * fabs(X) -> X * X
That fold works even with NAN because the sign-bit result of the multiply is
not specified if X is NAN.

We can't remove that and use the more general fold that is proposed here
because once we convert to this:
fabs (X * X)
...it is not legal to simplify the 'fabs' out of that expression when X is NAN.
That's because fabs() guarantees that the sign-bit is always cleared - even
for NAN values.

So this patch has the potential to lose information, but it seems unlikely if
we do the more specific fold ahead of this one.

Differential Revision: https://reviews.llvm.org/D82277
2020-06-25 11:35:38 -04:00
Sanjay Patel c336f21af5 [PhaseOrdering] delete test for vectorization; NFC
As requested in D81416, I'm deleting the file that I added with:
rGdf79443
2020-06-25 09:34:11 -04:00
Florian Hahn 4837daf883 [DSE,MSSA] Check if Def is removable only wen we try to remove it.
Non-removable MemoryDefs can still eliminate other defs. Update the
isRemovable checks to only candidates for removal.
2020-06-25 14:01:10 +01:00
Tyker c95ffadb24 [AssumeBundles] Use operand bundles to encode alignment assumptions
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.

Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71739
2020-06-25 12:59:44 +02:00
Tyker 8938a6c9ed [NFC] update test to make diff of the following commit clear 2020-06-25 12:59:44 +02:00
David Sherwood ee26a31e7b [SVE] Make ConstantFoldGetElementPtr work for scalable vectors of indices
This patch fixes a compiler crash that was hit when trying to simplify
the following code:

getelementptr [2 x i64], [2 x i64]* null, i64 0, <vscale x 2 x i64> zeroinitializer

For the case where we have a null pointer value like above, we just
need to ensure we don't assume the indices are always fixed width.

Differential Revision: https://reviews.llvm.org/D82183
2020-06-25 07:28:19 +01:00
Max Kazantsev 4c6548222b [Test] Add more tests for selects & phis 2020-06-25 10:54:07 +07:00
Max Kazantsev 1eeb714787 [InstCombine] Combine select & Phi by same condition
This patch transforms
```
p = phi [x, y]
s = select cond, z, p
```
with
```
s = phi[x, z]
```
if we can prove that the Phi node takes values basing on select's condition.

Differential Revision: https://reviews.llvm.org/D82072
Reviewed By: nikic
2020-06-25 10:44:10 +07:00
Michele Scandale 413a187856 [Inliner] Handle 'no-signed-zeros-fp-math' function attribute.
All other floating point math optimization related attribute are merged
in a conservative way during function inlining. This commit adds the
merge rule for the 'no-signed-zeros-fp-math' attribute.

Differential Revision: https://reviews.llvm.org/D81714
2020-06-24 17:53:59 -07:00
Amara Emerson 090c108d04 Don't inline dynamic allocas that simplify to huge static allocas.
Some sequences of optimizations can generate call sites which may never be
executed during runtime, and through constant propagation result in dynamic
allocas being converted to static allocas with very large allocation amounts.

The inliner tries to move these to the caller's entry block, resulting in the
stack limits being reached/bypassed. Avoid inlining functions if this would
result.

The threshold of 64k currently doesn't get triggered on the test suite with an
-Os LTO build on arm64, care should be taken in changing this in future to avoid
needlessly pessimising inlining behaviour.

Differential Revision: https://reviews.llvm.org/D81765
2020-06-24 17:39:03 -07:00
Kirill Naumov 7f094f7f9d [InlineCost] PrinterPass prints constants to which instructions are simplified
This patch enables printing of constants to see which instructions were
constant-folded. Needed for tests and better visiual analysis of
inliner's work.

Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D81024
2020-06-24 22:52:31 +00:00
Roman Lebedev 0c22147027
[NFCI][InstSimplify] Add CHECK-LABEL to new icmp.ll test 2020-06-25 01:10:35 +03:00
Roman Lebedev 8911a35180
[SROA] convertValue(): we can have <N x iK*> to <M x iQ> cast
Provided test case crashes otherwise.
Much like to the opposite case.
2020-06-25 00:58:54 +03:00
Roman Lebedev 07a23c06dd
[SROA] convertValue(): we can have <N x iK> to <M x iQ*> cast
Provided test case crashes otherwise.

If NewTy is already DL.getIntPtrType(NewTy),
CreateBitCast() won't actually create any bitcast,
so we are better off just doing the general thing.
2020-06-25 00:58:53 +03:00
Roman Lebedev 2b8d706b19
[IR] GetUnderlyingObject(), stripPointerCastsAndOffsets(): don't crash on `bitcast <1 x i8*> to i8*`
I'm not sure how to write standalone tests for each of two changes here.
If either one of these two fixes is missing, the test fill crash.
2020-06-25 00:58:53 +03:00
Roman Lebedev 381054a989
[InstCombine] visitBitCast(): do not crash on weird `bitcast <1 x i8*> to i8*`
Even if we know that RHS of a bitcast is a pointer,
we can't assume LHS is, because it might be
a single-element vector of pointer.
2020-06-25 00:58:53 +03:00
Yuanfang Chen ebc88811b5 Remove Passes dependency on CodeGen
The dependency was introduced in
5134020ea6. The only functional change
from this removal would be the new PM interface for the two codegen
passes. This is not necessary since we don't have codegen pipeline using
new PM yet. This removal is to break the potential circular dependency between
Passes and CodeGen once the codegen begins to gain new PM support.
2020-06-24 14:52:46 -07:00
Kirill Naumov 6a5d7d498c [InlineCost] InlineCostAnnotationWriterPass introduced
This class allows to see the inliner's decisions for better
optimization verifications and tests. To use, use flag
"-passes="print<inline-cost>"".

This is the second attempt to integrate the patch.
The problem from the first try has been discussed and
fixed in D82205.

Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev

Reviewed By: mtrofin

Differential revision: https://reviews.llvm.org/D81743
2020-06-24 21:27:07 +00:00
Florian Hahn 35bb9bfbb0 [SLP] Limit GEP lists based on width of index computation.
D68667 introduced a tighter limit to the number of GEPs to simplify
together. The limit was based on the vector element size of the pointer,
but the pointers themselves are not actually put in vectors.

IIUC we try to vectorize the index computations here, so we should base
the limit on the vector element size of the computation of the index.

This restores the test regression on AArch64 and also restores the
vectorization for a important pattern in SPEC2006/464.h264ref on
AArch64 (@test_i16_extend). We get a large benefit from doing a single
load up front and then processing the index computations in vectors.

Note that we could probably even further improve the AArch64 codegen, if
we would do zexts to i32 instead of i64 for the sub operands and then do
a single vector sext on the result of the subtractions. AArch64 provides
dedicated vector instructions to do so. Sketch of proof in Alive:
https://alive2.llvm.org/ce/z/A4xYAB

Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel

Reviewed By: ABataev, spatel

Differential Revision: https://reviews.llvm.org/D82418
2020-06-24 19:56:53 +01:00