Commit Graph

24569 Commits

Author SHA1 Message Date
Eric Christopher b6536e549d As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.
2020-06-19 15:12:18 -07:00
Sanjay Patel 216a37bb46 [VectorCombine] refactor extract-extract logic; NFCI 2020-06-19 14:52:27 -04:00
Sanjay Patel 6d864097a2 [VectorCombine] fix crash while transforming constants
This is a variation of the proposal in D82049 with an extra test.
2020-06-19 12:30:32 -04:00
Florian Hahn f9d8e33c32 [SCCP] Turn sext into zext for non-negative ranges.
This patch updates SCCP/IPSCCP to use the computed range info to turn
sexts into zexts, if the value is known to be non-negative. We already
to a similar transform in CorrelatedValuePropagation, but it seems like
we can catch a lot of additional cases by doing it in SCCP/IPSCCP as
well.

The transform is limited to ranges that are known to not include undef.

Currently constant ranges from conditions are treated as potentially
containing undef, due to PR46144. Once we flip this, the transform will
be more effective in practice.

Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D81756
2020-06-19 10:17:55 +01:00
Tyker b7338fb1a6 [AssumeBundles] add cannonicalisation to the assume builder
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79650
2020-06-19 10:32:26 +02:00
Matt Arsenault b13f6b0fe0 BypassSlowDivision: Fix dropping debug info
I don't know anything about debug info, but this seems like more work
should be necessary. This constructs a new IRBuilder and reconstructs
the original divides rather than moving the original.

One problem this has is if a div/rem pair are handled, both end up
with the same debugloc. I'm not sure how to fix this, since this uses
a cache when it sees the same input operands again, which will have
the first instance's location attached.
2020-06-18 17:27:19 -04:00
Christopher Tetreault 8d11ec66b6 [SVE] Remove calls to VectorType::getNumElements from Transforms/Utils
Reviewers: efriedma, c-rhodes, david-arm, Tyker, asbirlea

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82057
2020-06-18 13:39:14 -07:00
Sanjay Patel 46a285ad9e [IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC
The predicate can always be used to distinguish between icmp and fcmp,
so we don't need to keep repeating this check in the callers.
2020-06-18 15:47:06 -04:00
Davide Italiano 8cdd2a158c [SimplifyCFG] Update debug location when folding branch to common destination
Sometimes a dead block gets folded and the debug information is still
retained. This manifests as jumpy stepping in lldb, see the bugzilla PR
for an end-to-end C testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46008

Differential Revision:  https://reviews.llvm.org/D82062
2020-06-18 12:33:32 -07:00
serge-sans-paille 4dd332723d Fix return status of LoopDistribute
Move code that may update the IR after precondition, so that if precondition
fail, the IR isn't modified.

Differential Revision: https://reviews.llvm.org/D81225
2020-06-18 20:13:18 +02:00
Arthur Eubanks 91ef930526 [GlobalOpt] Remove preallocated calls when possible
When possible (e.g. internal linkage), strip preallocated attribute off
parameters/arguments.
This requires removing the "preallocated" operand bundle from the call
site, replacing @llvm.call.preallocated.arg() with an alloca and a
bitcast to i8*, and removing the @llvm.call.preallocated.setup(). Since
@llvm.call.preallocated.arg() can be called multiple times with the same
arg index, we create an alloca per arg index.
We add a @llvm.stacksave() where the @llvm.call.preallocated.setup() was
and a @llvm.stackrestore() after the preallocated call to prevent the
stack from blowing up. This is valid because the argument would normally
not exist on the stack after the call before the transformation.

This does not currently handle all possible preallocated calls. We will
need to figure out where to put @llvm.stackrestore() in the cases where
there is no obvious place to put it, for example conditional
preallocated calls, invokes.

This sort of transformation may need to be moved to somewhere more
accessible to accomodate similar transformations (like inlining) in the
future.

Reviewers: efriedma, hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80951
2020-06-18 09:56:13 -07:00
Florian Hahn 1669fddc9f [Matrix] Use alignment info when lowering loads/stores.
This patch updates LowerMatrixIntrinsics to preserve the alignment
specified at the original load/stores and the align attribute for the
pointer argument of the column.major.load/store intrinsics.

We can always use the specified alignment for the load of the first
column. For subsequent columns, the alignment may need to be reduced.

For ConstantInt strides, compute the offset for the start of the column in
bytes and use commonAlignment to get the largest valid alignment.

For non-ConstantInt strides, we need to take the common alignment of the
initial alignment and the element size in bytes.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, rjmccall

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D81960
2020-06-18 13:19:31 +01:00
Florian Hahn d88acd8f7d [Matrix] Preserve volatile when loading loads/stores.
Currently the matrix lowering turns volatile loads/stores into
non-volatile ones. This patch updates the lowering to preserve the
volatile bit.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D81498
2020-06-18 12:14:19 +01:00
Florian Hahn 6d18c2067e [Matrix] Update load/store intrinsics.
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).

The patch performs the following changes:
 * Rename columnwise.load/store to column.major.load/store. This is more
   expressive and also more in line with the naming in Clang.
 * Changes the stride arguments from i32 to i64. The stride can be
   larger than i32 and this makes things more uniform with the way
   things are handled in Clang.
 * A new boolean argument is added to indicate whether the load/store
   is volatile. The lowering respects that when emitting vector
   load/store instructions
 * MatrixBuilder is updated to require both Alignment and IsVolatile
   arguments, which are passed through to the generated intrinsic. The
   alignment is set using the `align` attribute.

The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse

Reviewed By: anemet, nicolasvasilache

Differential Revision: https://reviews.llvm.org/D81472
2020-06-18 09:44:52 +01:00
serge-sans-paille f9c7e3136e Correctly report modified status for HWAddressSanitizer
Differential Revision: https://reviews.llvm.org/D81238
2020-06-18 10:27:44 +02:00
Mehdi Amini 77b79d79c0 Remove "unused" member ModuleSlice from `struct OpenMPOpt`
This is fixing warning from clang:

 warning: private field 'ModuleSlice' is not used [-Wunused-private-field]
  SmallPtrSetImpl<Function *> &ModuleSlice;
                               ^

Differential Revision: https://reviews.llvm.org/D82027
2020-06-18 03:02:26 +00:00
Eric Christopher a8dad30388 Revert "Remove unused class variable ModuleSlice." as it was
used in debug only code.

This reverts commit 07a1749081.
2020-06-17 14:45:17 -07:00
Eric Christopher 07a1749081 Remove unused class variable ModuleSlice. 2020-06-17 14:33:29 -07:00
Roman Lebedev 84b4f5a6a6
[InstCombine] Negator: while there, add detection for cycles during negation
I don't have any testcases showing it happening,
and i haven't succeeded in creating one,
but i'm also not positive it can't ever happen,
and i recall having something that looked like
that in the very beginning of Negator creation.

But since we now already have a negation cache,
we can now detect such cases practically for free.

Let's do so instead of "relying" on stack overflow :D
2020-06-17 22:47:20 +03:00
Roman Lebedev e3d8cb1e1d
[InstCombine] Negator: cache negation results (PR46362)
It is possible that we can try to negate the same value multiple times.
For example, PHI nodes may happen to have multiple incoming values
(all of which must be the same value) for the same incoming basic block.
It may happen that we try to negate such a PHI node, and succeed,
and that might result in having now-different incoming values..

To avoid that, and in general to reduce the amount of duplicated
work we might be doing, let's introduce a cache where
we'll track results of negating each value.

The added test was previously failing -verify after -instcombine.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46362
2020-06-17 22:47:20 +03:00
Roman Lebedev c4166f3d84
[NFC][InstCombine] Negator: add thin negate() wrapped before visit() 2020-06-17 22:47:20 +03:00
Roman Lebedev 2b85147337
[NFC][InstCombine] Negator: do not include unneeded "llvm/IR/DerivedTypes.h" header 2020-06-17 22:47:19 +03:00
Nick Desaulniers 88c965ba14 BreakCriticalEdges for callbr indirect dests
Summary:
llvm::SplitEdge was failing an assertion that the BasicBlock only had
one successor (for BasicBlocks terminated by CallBrInst, we typically
have multiple successors).  It was surprising that the earlier call to
SplitCriticalEdge did not handle the critical edge (there was an early
return).  Removing that triggered another assertion relating to creating
a BlockAddress for a BasicBlock that did not (yet) have a parent, which
is a simple order of operations issue in llvm::SplitCriticalEdge (a
freshly constructed BasicBlock must be inserted into a Function's basic
block list to have a parent).

Thanks to @nathanchance for the report.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1018

Reviewers: craig.topper, jyknight, void, fhahn, efriedma

Reviewed By: efriedma

Subscribers: eli.friedman, rnk, efriedma, fhahn, hiraditya, llvm-commits, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81607
2020-06-17 11:45:06 -07:00
sstefan1 7cfd267c51 [OpenMPOPT][NFC] Introducing OMPInformationCache.
Summary:
Introduction of OpenMP-specific information cache based on Attributor's `InformationCache`. This should make it easier to share information between them.

Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku

Subscribers: yaxunl, hiraditya, guansong, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81798
2020-06-17 16:56:45 +02:00
Simon Pilgrim a5f1f9c9b8 ScalarEvolution.h - reduce LoopInfo.h include to forward declarations. NFC.
Move ScalarEvolution::forgetLoopDispositions implementation to ScalarEvolution.cpp to remove the dependency.

Add implicit header dependency to source files where necessary.
2020-06-17 15:48:23 +01:00
Sjoerd Meijer c1034d044a Follow up of rGe345d547a0d5, and attempt to pacify buildbot:
"error: 'get' is deprecated: The base class version of get with the scalable
argument defaulted to false is deprecated."

Changed VectorType::get() -> FixedVectorType::get().
2020-06-17 13:24:09 +01:00
Sjoerd Meijer e345d547a0 Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops"
Fixed ARM regression test.

Please see the original commit message rG47650451738c for details.
2020-06-17 13:12:15 +01:00
David Green 076e08aa45 [LSR] Filter for postinc formulae
In more complicated loops we can easily hit the complexity limits of
loop strength reduction. If we do and filtering occurs, it's all too
easy to remove the wrong formulae for post-inc preferring accesses due
to it attempting to maximise register re-use. The patch adds an
alternative filtering step when the target is preferring postinc to pick
postinc formulae instead, hopefully lowering the complexity to below the
limit so that aggressive filtering is not needed.

There is also a change in here to stop considering existing addrecs as
free under postinc. We should already be modelling them as a reg so
don't want it to cause us to get the cost wrong. (I'm not sure that code
makes sense in general, but there are X86 tests specifically for it
where it seems to be helping so have left it around for the standard
non-post-inc case).

Differential Revision: https://reviews.llvm.org/D80273
2020-06-17 12:32:04 +01:00
Sam Parker 5bf0858c0b Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant"
I originally reverted the patch because it was causing performance
issues, but now I think it's just enabling simplify-cfg to do
something that I don't want instead :)

Sorry for the noise.

This reverts commit 3e39760f8e.
2020-06-17 11:38:59 +01:00
Hans Wennborg 16ad6eeb94 [IR] Don't copy profile metadata in createCallMatchingInvoke()
The invoke instruction can have profile metadata with branch_weights,
which does not make sense for a call instruction and will be
rejected by the verifier.

Differential revision: https://reviews.llvm.org/D81996
2020-06-17 11:18:23 +02:00
serge-sans-paille 1cafd8a5d1 Fix LoopIdiomRecognize pass return status
Introduce an helper class to aggregate the cleanup in case of rollback.

Differential Revision: https://reviews.llvm.org/D81230
2020-06-17 11:12:03 +02:00
Sjoerd Meijer d4e183f686 Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops"
This reverts commit 4765045173
while I investigate the build bot failures.
2020-06-17 10:09:54 +01:00
Florian Hahn 773353be4e [SCCP] Move common code to simplify basic block to helper (NFC).
Reviewers: efriedma, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D81755
2020-06-17 10:03:43 +01:00
Sjoerd Meijer 4765045173 [LV] Emit @llvm.get.active.mask for tail-folded loops
This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checked by
querying TargetTransform hook emitGetActiveLaneMask.

This intrinsic creates a mask representing active and inactive vector lanes,
which is used by the masked load/store instructions that are created for
tail-folded loops. The semantics of @llvm.get.active.mask are described here in
LangRef:

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics

This intrinsic is also used to provide a hint to the backend. That is, the
second argument of the intrinsic represents the back-edge taken count of the
loop. For MVE, for example, we use that to set up tail-predication, which is a
new form of predication in MVE for vector loops that implicitely predicates the
last vector loop iteration by implicitely setting active/inactive lanes, i.e.
the tail loop is predicated. In order to set up a tail-predicated vector loop,
we need to know the number of data elements processed by the vector loop, which
corresponds the the tripcount of the scalar loop, which we can now reconstruct
using @llvm.get.active.mask.

Differential Revision: https://reviews.llvm.org/D79100
2020-06-17 09:53:58 +01:00
Christopher Tetreault ff628f5f5e [SVE] Eliminate calls to default-false VectorType::get() from Vectorize
Reviewers: efriedma, fhahn, spatel, sdesmalen, kmclaughlin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81521
2020-06-16 12:50:13 -07:00
Sanjay Patel ed67f5e7ab [VectorCombine] scalarize compares with insertelement operand(s)
Generalize scalarization (recently enhanced with D80885)
to allow compares as well as binops.
Similar to binops, we are avoiding scalarization of a loaded
value because that could avoid a register transfer in codegen.
This requires 1 extra predicate that I am aware of: we do not
want to scalarize the condition value of a vector select. That
might also invert a transform that we do in instcombine that
prefers a vector condition operand for a vector select.

I think this is the final step in solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

Differential Revision: https://reviews.llvm.org/D81661
2020-06-16 13:48:10 -04:00
Tyker d7deef1206 Revert "[AssumeBundles] add cannonicalisation to the assume builder"
This reverts commit 90c50cad19.
2020-06-16 14:34:55 +02:00
Tyker 90c50cad19 [AssumeBundles] add cannonicalisation to the assume builder
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, asbirlea, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79650
2020-06-16 13:12:35 +02:00
sstefan1 e099c7b64a [NFC][OpenMPOpt] Provide function-specific foreachUse. 2020-06-16 12:33:15 +02:00
Jay Foad 6fdd5a28b7 Revert "[IR] Clean up dead instructions after simplifying a conditional branch"
This reverts commit 69bdfb075b.

Reverting to investigate https://bugs.llvm.org/show_bug.cgi?id=46343
2020-06-16 10:32:15 +01:00
Gui Andrade b0ffa8befe [MSAN] Pass Origin by parameter to __msan_warning functions
Summary:
Normally, the Origin is passed over TLS, which seems like it introduces unnecessary overhead. It's in the (extremely) cold path though, so the only overhead is in code size.

But with eager-checks, calls to __msan_warning functions are extremely common, so this becomes a useful optimization.

This can save ~5% code size.

Reviewers: eugenis, vitalybuka

Reviewed By: eugenis, vitalybuka

Subscribers: hiraditya, #sanitizers, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D81700
2020-06-15 17:49:18 -07:00
Florian Hahn 120c059292 [DSE,MSSA] Port partial store merging.
Port partial constant store merging logic to MemorySSA backed DSE. The
heavy lifting is done by the existing helper function. It is used in
context where we already ensured that the later instruction can
eliminate the earlier one, if it is a complete overwrite.
2020-06-15 18:41:46 +01:00
Florian Hahn 71a91b9837 [DSE] Hoist partial store merging code into function (NFC).
Hoist the general logic into a new function, because it can be re-used
by the MemorySSA backed DSE as well.
2020-06-15 17:44:24 +01:00
Florian Hahn 8c61f13a0f [DSE,MSSA] Delete instructions after printing it.
Also enables a now-passing test case, that exposed a crash caused by the
wrong order.
2020-06-15 16:01:36 +01:00
Sam Parker 2596da3174 [CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.

The cost model test changes now reflect that ret instructions are not
generally free.

Differential Revision: https://reviews.llvm.org/D79164
2020-06-15 09:28:46 +01:00
Max Kazantsev 60da4369a1 [NFC] Bail early simplifying unconditional branches 2020-06-15 13:59:53 +07:00
Sam Parker 3e39760f8e Revert "Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant""
This reverts commit 23291b9863.

This caused performance regressions.
2020-06-15 07:46:28 +01:00
Whitney Tsang 5225cd43e8 [LoopUnroll] Allow loops with multiple exiting blocks where loop latch
is not necessary one of them.

Summary: Currently LoopUnrollPass already allow loops with multiple
exiting blocks, but it is only allowed when the loop latch is one of the
exiting blocks.
When the loop latch is not an exiting block, then only single exiting
block is supported.
When possible, the single loop latch or the single exiting block
terminator is optimized to an unconditional branch in the unrolled loop.

This patch allows loops with multiple exiting blocks even if the loop
latch is not one of them. However, the optimization of exiting block
terminator to unconditional branch is not done when there exists more
than one exiting block.
Reviewer: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: efriedma
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81053
2020-06-14 18:44:18 +00:00
Sanjay Patel 098e48a6a1 [PassManager] restore early-cse to vector cleanup
As noted in D80236 - the early-cse pass was included here before:
D75145 / rG71a316883d50
But it got moved outside of the "extra" option there, then it
got dropped while adjusting -vector-combine:
rG6438ea45e053
rG57bb4787d72f

So this is restoring the behavior and adding a test to prevent
accidental changes again. I don't see an equivalent option for
the new pass manager.
2020-06-14 10:04:53 -04:00
Sanjay Patel b5fb26951a [InstCombine] reassociate FP diff of sums into sum of diffs
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])

This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953

We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.

Differential Revision: https://reviews.llvm.org/D81491
2020-06-14 09:09:03 -04:00
Sanjay Patel aeb5044801 [InstCombine] allow undef elements when comparing vector constants for min/max bailout
This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
https://bugs.llvm.org/show_bug.cgi?id=46271

As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
with a transform that wants to pull a 'not' op through min/max via
SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
undefined elements in vector constants to avoid that. Alternatively, we could
improve or cripple the demanded elements analysis, but that could create even
more problems.

The likely better, safer alternative will be to create min/max intrinsics, so
we can remove all of the hacks related to min/max matching in instcombine.

Differential Revision: https://reviews.llvm.org/D81698
2020-06-14 09:02:47 -04:00
Roman Lebedev e987ee6318
[NFCI][AggressiveInstCombiner] Add `STATISTIC()`s for transforms 2020-06-13 23:53:16 +03:00
Florian Hahn 97e7147e34 [DSE,MSSA] Fix location order in isOverwrite call.
isOverwrite expects the later location as first argument and the earlier
result later. The adjusted call is intended to check whether CC
overwrites DefLoc.
2020-06-13 20:39:00 +01:00
Eric Christopher b422fe7d62 Temporarily revert "[MemCpyOptimizer] Simplify API of processStore and processMem* functions"
as it seems to be causing some internal crashes in AA after
email with the author.

This reverts commit f79e6a8847.
2020-06-12 14:01:27 -07:00
Roman Lebedev 7aeb41b3c8
[NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform 2020-06-12 23:10:53 +03:00
Roman Lebedev 55eb714a0e
[NFC] OpenMPOpt: add a statistic for num of parallel regions deleted 2020-06-12 23:10:53 +03:00
Marco Elver 8af7fa07aa [ASan][NFC] Refactor redzone size calculation
Refactor redzone size calculation. This will simplify changing the
redzone size calculation in future.

Note that AddressSanitizer.cpp violates the latest LLVM style guide in
various ways due to capitalized function names. Only code related to the
change here was changed to adhere to the style guide.

No functional change intended.

Reviewed By: andreyknvl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81367
2020-06-12 15:33:00 +02:00
Florian Hahn 4495a6b141 [BreakCritEdges] Add option to opt-out of perserving loop-simplify.
This patch adds a new option to CriticalEdgeSplittingOptions to control
whether loop-simplify form must be preserved. It is them used by GVN to
indicate that loop-simplify form does not have to be preserved.

This fixes a crash exposed by 189efe295b.

If the critical edge we are splitting goes from a block inside a loop to
a block outside the loop, splitting the edge will create a new exit
block. As a result, the new block will branch to the original exit
block, which will add a non-loop predecessor, breaking loop-simplify
form. To preserve loop-simplify form, the predecessor blocks of the
original exit are split, but that does not work for blocks with
indirectbr terminators. If preserving loop-simplify form is requested,
bail out , before making any changes.

Reviewers: reames, hfinkel, davide, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D81582
2020-06-12 11:47:13 +01:00
Florian Hahn 3a846d4d92 [VPlan] Reject loops without computable backedge taken counts
getOrCreateTripCount is used to generate code for the outer loop, but it
requires a computable backedge taken counts. Check that in the VPlan
native path.

Reviewers: Ayal, gilr, rengolin, sguggill

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D81088
2020-06-12 10:31:18 +01:00
EgorBo 012909dcaf
[InstCombine] "X - (X / C) * C == 0" to "X & C-1 == 0"
Summary:
"X % C == 0" is optimized to "X & C-1 == 0" (where C is a power-of-two)
However, "X % Y" can also be represented as "X - (X / Y) * Y" so if I rewrite the initial expression:
"X - (X / C) * C == 0" it's not currently optimized to "X & C-1 == 0", see godbolt: https://godbolt.org/z/KzuXUj

This is my first contribution to LLVM so I hope I didn't mess things up

Reviewers: lebedev.ri, spatel

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79369
2020-06-12 10:20:06 +03:00
Yevgeny Rouban 707836ed4e [JumpThreading] Handle zero !prof branch_weights
Avoid division by zero in updatePredecessorProfileMetadata().

Reviewers: yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81499
2020-06-12 11:55:15 +07:00
Alina Sbirlea 519b019a0a Verify MemorySSA after all updates.
Verify after completing all updates.
Resolves PR46275.
2020-06-11 18:48:41 -07:00
Sanjay Patel 039ff29ef6 [VectorCombine] remove unused parameters; NFC 2020-06-11 19:15:03 -04:00
Stanislav Mekhanoshin a98d618f6e Fixed assertion in SROA if block has ho successors
BasicBlock::isLegalToHoistInto() asserts if block does not
have successors. The case is degenarate but assertion still
needs to be avoided.

https://bugs.llvm.org/show_bug.cgi?id=46280

Differential Revision: https://reviews.llvm.org/D81674
2020-06-11 15:15:19 -07:00
serge-sans-paille bff09876d7 Fix return status of DataFlowSanitizer pass
Take into account added functions, global values and attribute change.

Differential Revision: https://reviews.llvm.org/D81239
2020-06-11 16:05:17 +02:00
Jay Foad 69bdfb075b [IR] Clean up dead instructions after simplifying a conditional branch
Change BasicBlock::removePredecessor to optionally return a vector of
instructions which might be dead. Use this in ConstantFoldTerminator to
delete them if they are dead.

Reapply with a bug fix: don't drop the "!KeepOneInputPHIs" argument when
removePredecessor calls PHINode::removeIncomingValue.

Differential Revision: https://reviews.llvm.org/D80206
2020-06-11 14:53:01 +01:00
Jay Foad f45c65aa41 Revert "[IR] Clean up dead instructions after simplifying a conditional branch"
This reverts commit 4494e45316.

It caused problems for sanitizer buildbots.
2020-06-11 14:22:16 +01:00
Jay Foad 4494e45316 [IR] Clean up dead instructions after simplifying a conditional branch
Change BasicBlock::removePredecessor to optionally return a vector of
instructions which might be dead. Use this in ConstantFoldTerminator to
delete them if they are dead.

Differential Revision: https://reviews.llvm.org/D80206
2020-06-11 13:28:10 +01:00
Jay Foad f79e6a8847 [MemCpyOptimizer] Simplify API of processStore and processMem* functions
Previously these functions either returned a "changed" flag or a "repeat
instruction" flag, and could also modify an iterator to control which
instruction would be processed next.

Simplify this by always returning a "changed" flag, and handling all of
the "repeat instruction" functionality by modifying the iterator.

No functional change intended except in this case:
// If the source and destination of the memcpy are the same, then zap it.
... where the previous code failed to process the instruction after the
zapped memcpy.

Differential Revision: https://reviews.llvm.org/D81540
2020-06-11 12:48:09 +01:00
Chris Jackson 4707bc2177 [DebugInfo] Refactor SalvageDebugInfo and SalvageDebugInfoForDbgValues
- Simplify the salvaging interface and the algorithm in InstCombine

Reviewers: vsk, aprantl, Orlando, jmorse, TWeaver

Reviewed by: Orlando

Differential Revision: https://reviews.llvm.org/D79863
2020-06-11 11:13:46 +01:00
Craig Topper 94b1404587 [InstCombine] Remove some repeated calls to getOperand. NFCI
We had alread loaded operand 1 and 2 of the select as TV and FV
using the more the readable getTrueValue/getFalseValue.
2020-06-10 16:54:50 -07:00
serge-sans-paille 9daccb7a47 Correctly update Changed status for SimplifyCFG
Interestingly, this leads to better output in one of the test case.

Differential Revision: https://reviews.llvm.org/D81237
2020-06-10 16:54:15 +02:00
Kuter Dinel 70330edc4d Reland: [Attributor] Split the Attributor::run() into multiple functions.
Summary:
This patch splits the Attributor::run() function into multiple
functions.

Simple Logic changes to make this possible:
  # Moved iteration count verification earlier.
  # NumFinalAAs get set a little bit later.

Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81022
2020-06-10 13:21:22 +00:00
Marco Elver d3f89314ff [KernelAddressSanitizer] Make globals constructors compatible with kernel [v2]
[ v1 was reverted by c6ec352a6b due to
  modpost failing; v2 fixes this. More info:
  https://github.com/ClangBuiltLinux/linux/issues/1045#issuecomment-640381783 ]

This makes -fsanitize=kernel-address emit the correct globals
constructors for the kernel. We had to do the following:

* Disable generation of constructors that rely on linker features such
  as dead-global elimination.

* Only instrument globals *not* in explicit sections. The kernel uses
  sections for special globals, which we should not touch.

* Do not instrument globals that are prefixed with "__" nor that are
  aliased by a symbol that is prefixed with "__". For example, modpost
  relies on specially named aliases to find globals and checks their
  contents. Unfortunately modpost relies on size stored as ELF debug info
  and any padding of globals currently causes the debug info to cause size
  reported to be *with* redzone which throws modpost off.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203493

Tested:
* With 'clang/test/CodeGen/asan-globals.cpp'.

* With test_kasan.ko, we can see:

  	BUG: KASAN: global-out-of-bounds in kasan_global_oob+0xb3/0xba [test_kasan]

* allyesconfig, allmodconfig (x86_64)

Reviewed By: glider

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81390
2020-06-10 15:08:42 +02:00
sstefan1 3013f2d329 Revert "[Attributor] Split the Attributor::run() into multiple functions."
This reverts commit 0ee47cc92f.
2020-06-10 10:10:49 +00:00
stefan 0ee47cc92f [Attributor] Split the Attributor::run() into multiple functions.
Summary:
This patch splits the Attributor::run() function into multiple functions.

Simple Logic changes to make this possible:
  # Moved iteration count verification earlier.
  # NumFinalAAs get set a little bit later.

Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81022
2020-06-10 09:48:58 +00:00
Florian Hahn 67671024c8 [DSE,MSSA] Relax post-dom restriction for objs visible after return.
This patch relaxes the post-dominance requirement for accesses to
objects visible after the function returns.

Instead of requiring the killing def to post-dominate the access to
eliminate, the set of 'killing blocks' (= blocks that completely
overwrite the original access) is collected.

If all paths from the access to eliminate and an exit block go through a
killing block, the access can be removed.

To check this property, we first get the common post-dominator block for
the killing blocks. If this block does not post-dominate the access
block, there may be a path from DomAccess to an exit block not involving
any killing block.

Otherwise we have to check if there is a path from the DomAccess to the
common post-dominator, that does not contain a killing block. If there
is no such path, we can remove DomAccess. For this check, we start at
the common post-dominator and then traverse the CFG backwards. Paths are
terminated when we hit a killing block or a block that is not executed
between DomAccess and a killing block according to the post-order
numbering (if the post order number of a block is greater than the one
of DomAccess, the block cannot be in in a path starting at DomAccess).

This gives the following improvements on the total number of stores
after DSE for MultiSource, SPEC2K, SPEC2006:

Tests: 237
Same hash: 206 (filtered out)
Remaining: 31
Metric: dse.NumRemainingStores

Program                                        base      new100    diff
 test-suite...CFP2000/188.ammp/188.ammp.test   3624.00   3544.00   -2.2%
 test-suite...ch/g721/g721encode/encode.test   128.00    126.00    -1.6%
 test-suite.../Benchmarks/Olden/mst/mst.test    73.00     72.00    -1.4%
 test-suite...CFP2006/433.milc/433.milc.test   3202.00   3163.00   -1.2%
 test-suite...000/186.crafty/186.crafty.test   5062.00   5010.00   -1.0%
 test-suite...-typeset/consumer-typeset.test   40460.00  40248.00  -0.5%
 test-suite...Source/Benchmarks/sim/sim.test   642.00    639.00    -0.5%
 test-suite...nchmarks/McCat/09-vor/vor.test   642.00    644.00     0.3%
 test-suite...lications/sqlite3/sqlite3.test   35664.00  35563.00  -0.3%
 test-suite...T2000/300.twolf/300.twolf.test   7202.00   7184.00   -0.2%
 test-suite...lications/ClamAV/clamscan.test   19475.00  19444.00  -0.2%
 test-suite...INT2000/164.gzip/164.gzip.test   2199.00   2196.00   -0.1%
 test-suite...peg2/mpeg2dec/mpeg2decode.test   2380.00   2378.00   -0.1%
 test-suite.../Benchmarks/Bullet/bullet.test   39335.00  39309.00  -0.1%
 test-suite...:: External/Povray/povray.test   36951.00  36927.00  -0.1%
 test-suite...marks/7zip/7zip-benchmark.test   67396.00  67356.00  -0.1%
 test-suite...6/464.h264ref/464.h264ref.test   31497.00  31481.00  -0.1%
 test-suite...006/453.povray/453.povray.test   51441.00  51416.00  -0.0%
 test-suite...T2006/401.bzip2/401.bzip2.test   4450.00   4448.00   -0.0%
 test-suite...Applications/kimwitu++/kc.test   23481.00  23471.00  -0.0%
 test-suite...chmarks/MallocBench/gs/gs.test   6286.00   6284.00   -0.0%
 test-suite.../CINT2000/254.gap/254.gap.test   13719.00  13715.00  -0.0%
 test-suite.../Applications/SPASS/SPASS.test   30345.00  30338.00  -0.0%
 test-suite...006/450.soplex/450.soplex.test   15018.00  15016.00  -0.0%
 test-suite...ications/JM/lencod/lencod.test   27780.00  27777.00  -0.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test   105285.00 105276.00 -0.0%

There might be potential to pre-compute some of the information of which
blocks are on the path to an exit for each block, but the overall
benefit might be comparatively small.

On the set of benchmarks, 15738 times out of 20322 we reach the
CFG check, the CFG check is successful. The total number of iterations
in the CFG check is 187810, so on average we need less than 10 steps in
the check loop. Bumping the threshold in the loop from 50 to 150 gives a
few small improvements, but I don't think they warrant such a big bump
at the moment. This is all pending further tuning in the future.

Reviewers: dmgreen, bryant, asbirlea, Tyker, efriedma, george.burgess.iv

Reviewed By: george.burgess.iv

Differential Revision: https://reviews.llvm.org/D78932
2020-06-10 10:39:25 +01:00
Vitaly Buka 5a3b380f49 Revert "[InstrProfiling] Use !associated metadata for counters, data and values"
This reverts commit 69c5ff4668.
This reverts commit 603d58b5e4.
This reverts commit ba10bedf56.
This reverts commit 39b3c41b65.
2020-06-10 02:32:50 -07:00
Whitney Tsang 01e64c9712 [LoopFusion] Update second loop guard non loop successor phis incoming
blocks.

Summary: The current LoopFusion forget to update the incoming block of
the phis in second loop guard non loop successor from second loop guard
block to first loop guard block. A test case is provided to better
understand the problem.
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D81421
2020-06-09 21:14:51 +00:00
Christopher Tetreault 765ac39db2 [SVE] Eliminate calls to default-false VectorType::get() from Scalar
Reviewers: efriedma, kmclaughlin, sdesmalen, fhahn, bkramer, anna, gchatelet, c-rhodes, david-arm, fpetrogalli

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, dantrushin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80336
2020-06-09 14:09:02 -07:00
Simon Pilgrim 5dc4e7c2b9 [VectorCombine] scalarizeBinop - support an all-constant src vector operand
scalarizeBinop currently folds

  vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index))
  ->
  inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index)

This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense:

  vec_bo((inselt VecC0, V0, Index), VecC1)
  ->
  inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index)

Fixes PR42174

Differential Revision: https://reviews.llvm.org/D80885
2020-06-09 19:02:05 +01:00
Simon Pilgrim 8233439fdb [InstCombine] Ensure allocation alignment mask is within range before applying as an attribute
Fixes OSS-Fuzz #23214
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23214
2020-06-09 17:31:55 +01:00
serge-sans-paille 5b08bd0eb4 Fix MemCpyOptimizer return status
Differential Revision: https://reviews.llvm.org/D81229
2020-06-09 14:24:33 +02:00
serge-sans-paille ef1a7f2f01 Update pass status for GCOVProfiling
Take fork/exec instrumentation into account.

Differential Revision: https://reviews.llvm.org/D81227
2020-06-09 14:23:30 +02:00
Petr Hosek 603d58b5e4 [InstrProfiling] Use !associated metadata for counters, data and values
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object. This
metadata prevents discarding of the global object in linker GC unless
the referenced object is also discarded.

Furthermore, when a function symbol is discarded by the linker, setting
up !associated metadata allows linker to discard counters, data and
values associated with that function symbol. This is not possible today
because there's metadata to guide the linker. This approach is also used
by other instrumentations like sanitizers.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Differential Revision: https://reviews.llvm.org/D76802
2020-06-08 15:07:43 -07:00
Petr Hosek ba10bedf56 Revert "[InstrProfiling] Use !associated metadata for counters, data and values"
This reverts commit 39b3c41b65 due to
a failing associated.ll test.
2020-06-08 14:38:15 -07:00
Stanislav Mekhanoshin 87ff3401eb Stabilize alloca slices sort in SROA
Slice::operator<() has a non-deterministic behavior. If we have
identical slices comparison will depend on the order or operands.
Normally that does not result in unstable compilation results
because the order in which slices are inserted into the vector
is deterministic and llvm::sort() normally behaves as a stable
sort, although that is not guaranteed.

However, there is test option -sroa-random-shuffle-slices which
is used to check exactly this aspect. The vector is first randomly
shuffled and then sorted. The same shuffling happens without this
option under expensive llvm checks.

I have managed to write a test which has hit this problem.

There are no fields in the Slice class to resolve the instability.
We only have offsets, IsSplittable and Use, but neither Use nor
User have anything suitable for predictable comparison.

I have switched to stable_sort which has to be sufficient and
removed that randon shuffle option.

Differential Revision: https://reviews.llvm.org/D81310
2020-06-08 14:25:27 -07:00
Petr Hosek 39b3c41b65 [InstrProfiling] Use !associated metadata for counters, data and values
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object. This
metadata prevents discarding of the global object in linker GC unless
the referenced object is also discarded.

Furthermore, when a function symbol is discarded by the linker, setting
up !associated metadata allows linker to discard counters, data and
values associated with that function symbol. This is not possible today
because there's metadata to guide the linker. This approach is also used
by other instrumentations like sanitizers.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Differential Revision: https://reviews.llvm.org/D76802
2020-06-08 13:35:56 -07:00
Hans Wennborg fc202c5fec [PGO] CallPromotion: Don't try to pass sret args to varargs functions
It's not allowed by the verifier.

Differential revision: https://reviews.llvm.org/D81409
2020-06-08 21:10:27 +02:00
Sanjay Patel d50366d29f [InstCombine] improve matching for sext-lshr-trunc patterns, part 2
Similar to rG42f488b63a04

This is intended to preserve the logic of the existing transform,
but remove unnecessary restrictions on uses and types.

https://rise4fun.com/Alive/oS0

  Name: narrow input
  Pre: C1 <= width(C1) - 24
  %B = sext i8 %A
  %C = lshr %B, C1
  %r = trunc %C to i24
  =>
  %s = ashr i8 %A, trunc(umin(C1, 7))
  %r = sext i8 %s to i24

  Name: wide input
  Pre: C1 <= width(C1) - 24
  %B = sext i24 %A
  %C = lshr %B, C1
  %r = trunc %C to i8
  =>
  %s = ashr i24 %A, trunc(umin(C1, 23))
  %r = trunc i24 %s to i8
2020-06-08 14:41:50 -04:00
Chris Jackson c6c65164af [DebugInfo] Reduce SalvageDebugInfo() functions
- Now all SalvageDebugInfo() calls will mark undef if the salvage
  attempt fails.

 Reviewed by: vsk, Orlando

 Differential Revision: https://reviews.llvm.org/D78369
2020-06-08 19:28:18 +01:00
Hiroshi Yamauchi b5632f4083 [PGO][PGSO] Enable non-cold code size opts under non-partial-profile sample PGO.
Summary: Following up D78949.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81020
2020-06-08 10:02:00 -07:00
Sanjay Patel 42f488b63a [InstCombine] improve matching for sext-lshr-trunc patterns
This is intended to preserve the logic of the existing transform,
but remove unnecessary restrictions on uses and types.

https://rise4fun.com/Alive/pYfR

  Pre: C1 <= width(C1) - 8
  %B = sext i8 %A
  %C = lshr %B, C1
  %r = trunc %C to i8
   =>
  %r = ashr i8 %A, trunc(umin(C1, 7))
2020-06-08 11:55:30 -04:00
Sanjay Patel af7587d755 [InstCombine] reduce code duplication in visitTrunc(); NFC 2020-06-08 11:15:44 -04:00
Marco Elver c6ec352a6b Revert "[KernelAddressSanitizer] Make globals constructors compatible with kernel"
This reverts commit 866ee2353f.

Building the kernel results in modpost failures due to modpost relying
on debug info and inspecting kernel modules' globals:
https://github.com/ClangBuiltLinux/linux/issues/1045#issuecomment-640381783
2020-06-08 10:34:03 +02:00
Benjamin Kramer 3badd17b69 SmallPtrSet::find -> SmallPtrSet::count
The latter is more readable and more efficient. While there clean up
some double lookups. NFCI.
2020-06-07 22:38:08 +02:00
Fangrui Song e3200dab60 [gcov] Support .gcno/.gcda in gcov 8, 9 or 10 compatible formats 2020-06-07 11:27:49 -07:00
AK 96458fc510 Add cl::ZeroOrMore to get around build system issues
It is quite common to get multiple instances of optimization flags while building.
The following optimizations does not have cl::ZeroOrMore which causes errors during the build.

Reviewers: alexbdv,spop

Differential Revision: https://reviews.llvm.org/D81187
2020-06-07 10:15:18 -07:00
Sanjay Patel 2552f65183 [InstCombine] fold mask op into casted shift (PR46013)
https://rise4fun.com/Alive/Qply8

  Pre: C2 == (-1 u>> zext(C1))
  %a = ashr %x, C1
  %s = sext %a to i16
  %r = and i16 %s, C2
    =>
  %s2 = sext %x to i16
  %r = lshr i16 %s2, zext(C1)

https://bugs.llvm.org/show_bug.cgi?id=46013
2020-06-07 09:33:18 -04:00
Simon Pilgrim 1c2d2c88b4 AlignmentFromAssumptions.h - reduce includes to forward declarations. NFC. 2020-06-07 13:51:48 +01:00
Fangrui Song 693ff89f47 [gcov] Delete unneeded code 2020-06-06 20:36:46 -07:00
Fangrui Song cdd683b516 [gcov] Support big-endian .gcno and simplify version handling in .gcda 2020-06-06 11:01:47 -07:00
Simon Pilgrim f14d4c9c54 EHPersonalities.h - reduce Triple.h include to forward declaration. NFC.
Move implicit include dependencies down to source files.
2020-06-06 15:48:31 +01:00
dfukalov c94d32a6b3 [AMDGPU] Increase max iterations count to analyze complete unroll
Summary: In some cases inner loops may not get boosts so try to analyze them deeper.

Reviewers: rampitec, mzolotukhin

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81204
2020-06-06 16:32:45 +03:00
Simon Pilgrim 5006e551d3 LoopAnalysisManager.h - reduce includes to forward declarations. NFC.
Move implicit include dependencies down to header/source files.
2020-06-06 14:06:46 +01:00
Nikita Popov ff1210edb6 [NewGVN] Remove alignment from LoadExpression (NFC)
The alignment is not actually used.
2020-06-06 11:49:20 +02:00
Nikita Popov a4953db530 [InstCombine] Remove unnecessary MaybeAlign use (NFC)
Alloca align is required now.
2020-06-06 11:44:01 +02:00
Richard Smith f39e12a06b PR34581: Don't remove an 'if (p)' guarding a call to 'operator delete(p)' under -Oz.
Summary:
This transformation is correct for a builtin call to 'free(p)', but not
for 'operator delete(p)'. There is no guarantee that a user replacement
'operator delete' has no effect when called on a null pointer.

However, the principle behind the transformation *is* correct, and can
be applied more broadly: a 'delete p' expression is permitted to
unconditionally call 'operator delete(p)'. So do that in Clang under
-Oz where possible. We do this whether or not 'p' has trivial
destruction, since the destruction might turn out to be trivial after
inlining, and even for a class-specific (but non-virtual,
non-destroying, non-array) 'operator delete'.

Reviewers: davide, dnsampaio, rjmccall

Reviewed By: dnsampaio

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79378
2020-06-05 17:13:43 -07:00
Nikita Popov bff94a8e2b [LoopIdiomRecognize] Remove unnecessary MaybeAlign use (NFC)
Loads and stores always have an alignment now.
2020-06-05 23:11:57 +02:00
Stanislav Mekhanoshin 1e9a0a4e04 SROA: Remove pointer from visited along with instruction
If an instruction is erased we also need to remove it from
Visited set. There is a very small chance that an another
newly created instruction will be created with the same
pointer value in place of an erased one.

Differential Revision: https://reviews.llvm.org/D80958
2020-06-05 12:47:23 -07:00
Marco Elver 866ee2353f [KernelAddressSanitizer] Make globals constructors compatible with kernel
Summary:
This makes -fsanitize=kernel-address emit the correct globals
constructors for the kernel. We had to do the following:

- Disable generation of constructors that rely on linker features such
  as dead-global elimination.

- Only emit constructors for globals *not* in explicit sections. The
  kernel uses sections for special globals, which we should not touch.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203493

Tested:
1. With 'clang/test/CodeGen/asan-globals.cpp'.
2. With test_kasan.ko, we can see:

  	BUG: KASAN: global-out-of-bounds in kasan_global_oob+0xb3/0xba [test_kasan]

Reviewers: glider, andreyknvl

Reviewed By: glider

Subscribers: cfe-commits, nickdesaulniers, hiraditya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D80805
2020-06-05 20:20:46 +02:00
Arthur Eubanks 8133e289b6 Add ASan metadata globals to @llvm.compiler.used under COFF
Summary:
This matches ELF.

This makes the number of ASan failures under the new pass manager on
Windows go from 18 to 1.

Under the old pass manager, the ASan module pass was one of the very
last things run, so these globals didn't get removed due to GlobalOpt.
But with the NPM the ASan module pass that adds these globals are run
much earlier in the pipeline and GlobalOpt ends up removing them.

Reviewers: vitalybuka, hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81175
2020-06-05 09:04:52 -07:00
serge-sans-paille 977d27d881 [SCCP] Report changes after removing stores to constant global
Differential Revision: https://reviews.llvm.org/D81228
2020-06-05 16:09:07 +02:00
serge-sans-paille 8405f6bcd4 Correctly report modified status for DivRemPairs
Differential Revision: https://reviews.llvm.org/D81231
2020-06-05 16:06:31 +02:00
serge-sans-paille 424510095d Correctly report modified status for DSE
Differential Revision: https://reviews.llvm.org/D81233
2020-06-05 15:59:42 +02:00
serge-sans-paille f987cceb13 Correctly report modified status for TailRecursionElimination
Differential Revision: https://reviews.llvm.org/D81232
2020-06-05 15:58:20 +02:00
serge-sans-paille 1086d777be Correctly report modified status for ObjCARCContract
Differential Revision: https://reviews.llvm.org/D81226
2020-06-05 15:56:57 +02:00
serge-sans-paille 80f1ec7008 Correctly report modified status for ObjCARCOpt
Differential Revision: https://reviews.llvm.org/D81234
2020-06-05 15:56:57 +02:00
Max Kazantsev 23291b9863 Return "[InstCombine] Simplify compare of Phi with constant inputs against a constant"
This reverts commit c4b5a66e44.

Returning along with Clang test fix
2020-06-05 20:48:29 +07:00
serge-sans-paille 2e5940cf29 Correctly report modified status for LoopSimplify
Differential Revision: https://reviews.llvm.org/D81235
2020-06-05 15:46:28 +02:00
serge-sans-paille 2fc085e0e5 Fix return status of AddressSanitizer pass
Differential Revision: https://reviews.llvm.org/D81240
2020-06-05 15:44:51 +02:00
Simon Pilgrim 06fd973c85 TargetLibraryInfo.h - reduce Triple.h include to forward declaration. NFC.
Move implicit include dependencies down to source files.
2020-06-05 14:35:30 +01:00
Kadir Cetinkaya c4b5a66e44
Revert "[InstCombine] Simplify compare of Phi with constant inputs against a constant"
This reverts commit 16b7eb6dd1.

Breaks build bots, see
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/29888
for an example.
2020-06-05 13:02:35 +02:00
Max Kazantsev 16b7eb6dd1 [InstCombine] Simplify compare of Phi with constant inputs against a constant
We can simplify
```
  icmp <pred> phi(C1, C2, ...), C
```
with
```
  phi(icmp(C1, C), icmp(C2, C), ...)
```
provided that all comparison of constants are constants themselves.

Differential Revision: https://reviews.llvm.org/D81151
Reviewed By: lebedev.ri
2020-06-05 17:02:47 +07:00
Simon Pilgrim 44d86982d2 MemorySSAUpdater.h - reduce unnecessary includes to forward declarations. NFC.
Remove unnecessary MemoryAccess forward declaration as its already included from MemorySSA.h

Move implicit include dependencies down to source files.
2020-06-05 10:45:59 +01:00
Max Kazantsev 80cb25cbd5 Revert "[InstCombine][NFC] Factor out constant check"
This reverts commit 9bdb918890.

This refactoring proved to not be useful.
2020-06-05 12:00:44 +07:00
Petr Hosek d76e62fdb7 [AddressSanitizer] Don't use weak linkage for __{start,stop}_asan_globals
It should not be necessary to use weak linkage for these. Doing so
implies interposablity and thus PIC generates indirections and
dynamic relocations, which are unnecessary and suboptimal. Aside
from this, ASan instrumentation never introduces GOT indirection
relocations where there were none before--only new absolute relocs
in RELRO sections for metadata, which are less problematic for
special linkage situations that take pains to avoid GOT generation.

Patch By: mcgrathr

Differential Revision: https://reviews.llvm.org/D80605
2020-06-04 20:18:35 -07:00
Philip Reames 3d40c75189 [Statepoint] Switch RS4GC to using gc-live bundle form
Now that we have an operand based form for the GC arguments to a statepoint intrinsic, update RS4GC to use it and update tests to reflect. This is pretty straight forward. I nearly landed without review, but figured a second set of eyes didn't hurt.

Differential Revision: https://reviews.llvm.org/D81121
2020-06-04 15:49:11 -07:00
Petr Hosek b16ed493dd [Fuchsia] Rely on linker switch rather than dead code ref for profile runtime
Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.

Differential Revision: https://reviews.llvm.org/D79835
2020-06-04 15:47:05 -07:00
Petr Hosek e1ab90001a Revert "[Fuchsia] Rely on linker switch rather than dead code ref for profile runtime"
This reverts commit d510542174 since
it broke several bots.
2020-06-04 15:44:10 -07:00
Craig Topper 3ad8fbd205 [Reassociate] Teach ConvertShiftToMul to preserve nsw flag if the shift amount is not bitwidth - 1.
Multiply and shl have different signed overflow behavior in
some cases. But it looks like we should be ok as long as the
shift amount is less than bitwidth - 1.

Alive2: http://volta.cs.utah.edu:8080/z/MM4WZP

Differential Revision: https://reviews.llvm.org/D81189
2020-06-04 14:51:34 -07:00
Sanjay Patel 192cb71836 [InstCombine] avoid crashing on select-shuffle detection
As mentioned in the post-commit comments of D81013 -
the mask check API has to assume the shuffle is
not length-changing, but we have not ruled that out
in this code. Use the ShuffleVectorInst call instead.
2020-06-04 17:27:14 -04:00
Petr Hosek d510542174 [Fuchsia] Rely on linker switch rather than dead code ref for profile runtime
Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.

Patch By: mcgrathr

Differential Revision: https://reviews.llvm.org/D79835
2020-06-04 14:25:19 -07:00
Sanjay Patel 8a96c1f627 [InstCombine] move vector select ahead of select-shuffle
select Cond, (shuf_sel X, Y), X --> shuf_sel X, (select Cond, Y, X)

A select of a select-shuffle ("blend" in x86 lingo) can be reversed
so that the select is done first.
This is a more limited version of what I was trying in D80658,
but it enables existing demanded bits transforms to catch some of the
motivating cases. The tricky bit in that seems to be that by moving
the shuffle later, we can always guarantee that poison is correctly
inhibited by the shuffle mask in the final value.

Alive2 checks for the basic tests:
http://volta.cs.utah.edu:8080/z/Qqd3RK
http://volta.cs.utah.edu:8080/z/S4wchM
http://volta.cs.utah.edu:8080/z/wf9zPL
http://volta.cs.utah.edu:8080/z/wJeEGk

Differential Revision: https://reviews.llvm.org/D81013
2020-06-04 14:29:13 -04:00
Layton Kifer 7381fcdf62 [TRE] Allow accumulator elimination when base case returns non-constant
Remove the requirement, that when performing accumulator elimination,
all other cases must return the same dynamic constant. We can do this by
initializing the accumulator with the identity value of the accumulation
operation, and inserting an additional operation before any return.

Differential Revision: https://reviews.llvm.org/D80844
2020-06-04 10:34:42 -07:00
Huihui Zhang bd43f78c76 [LSR][SCEVExpander] Avoid blind cast 'Factor' to SCEVConstant in FactorOutConstant.
Summary:
In SCEVExpander FactorOutConstant(), when GEP indexing into/over scalable vector,
it is legal for the 'Factor' in a MulExpr to be the size of a scalable vector
instead of a compile-time constant.

Current upstream crash with the test attached.

Reviewers: efriedma, sdesmalen, sanjoy.google, mkazantsev

Reviewed By: efriedma

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80973
2020-06-04 10:33:39 -07:00
Max Kazantsev 9bdb918890 [InstCombine][NFC] Factor out constant check
We plan to add more transforms here. Besides, this check should be
done in the beginning just from function's name.
2020-06-04 18:54:23 +07:00
Yevgeny Rouban dcfa78a4cc Extend InvokeInst !prof branch_weights metadata to unwind branches
Allow InvokeInst to have the second optional prof branch weight for
its unwind branch. InvokeInst is a terminator with two successors.
It might have its unwind branch taken many times. If so
the BranchProbabilityInfo unwind branch heuristic can be inaccurate.
This patch allows a higher accuracy calculated with both branch
weights set.

Changes:
 - A new section about InvokeInst is added to
   the BranchWeightMetadata page. It states the old information that
   missed in the doc and adds new about the second branch weight.
 - Verifier is changed to allow either 1 or 2 branch weights
   for InvokeInst.
 - A new test is written for BranchProbabilityInfo to demonstrate
   the main improvement of the simple fix in calcMetadataWeights().
 - Several new testcases are created for Inliner. Those check that
    both weights are accounted for invoke instruction weight
    calculation.
 - PGOUseFunc::setBranchWeights() is fixed to be applicable to
   InvokeInst.

Reviewers: davidxl, reames, xur, yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80618
2020-06-04 15:37:15 +07:00
Yevgeny Rouban 417bcb8827 [Instruction] Remove setProfWeight()
Remove the function Instruction::setProfWeight() and make
use of Instruction::copyMetadata(.., {LLVMContext::MD_prof}).
This is correct for all use cases of setProfWeight() as it
is applied to CallBase instructions only.
This change results in prof metadata copied intact even if
the source has "VP". The old pair of calls
extractProfTotalWeight() + setProfWeight() resulted in
setting branch_weights if the source had "VP" data.

Reviewers: yamauchi, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80987
2020-06-04 15:10:55 +07:00
Arnold Schwaighofer 2e4c5d1c48 CoroSplit: Fix coroutine splitting for retcon and retcon.once
Summary:
For retcon and retcon.once coroutines we assume that all uses of spills
can be sunk past coro.begin. This simplifies handling of instructions
that escape the address of an alloca.

The current implementation would have issues if the address of the
alloca is escaped before coro.begin. (It also has issues with casts before and
uses of those casts after the coro.begin instruction)

  %alloca_addr = alloca ...
  %escape  = ptrtoint %alloca_addr
  coro.begin
  store %escape to %alloca_addr

rdar://60272809

Subscribers: hiraditya, modocache, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81023
2020-06-03 12:10:58 -07:00
Florian Hahn 211596c94e [VPlan] Support extracting lanes for defs managed in VPTransformState.
Currently extracting a lane for a VPValue def is not supported, if it is
managed directly by VPTransformState (e.g. because it is created by a
VPInstruction or an external VPValue def).

For now, simply extract the requested lane. In the future, we should
also cache the extracted scalar values, similar to LV.

Reviewers: Ayal, rengolin, gilr, SjoerdMeijer

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D80787
2020-06-03 12:14:16 +01:00
Kazu Hirata f355c7fc2f [JumpThreading] Simplify FindMostPopularDest (NFC)
Summary:
This patch simplifies FindMostPopularDest without changing the
functionality.

Given a list of jump threading destinations, the function finds the
most popular destination.  To ensure determinism when there are
multiple destinations with the highest popularity, the function picks
the first one in the successor list with the highest popularity.

Without this patch:

- The function populates DestPopularity -- a histogram mapping
  destinations to their respective occurrence counts.

- Then we iterate over DestPopularity, looking for the highest
  popularity while building a vector of destinations with the highest
  popularity.

- Finally, we iterate the successor list, looking for the destination
  with the highest popularity.

With this patch:

- We implement DestPopularity with MapVector instead of DenseMap.  We
  populate the map with popularity 0 for all successors in the order
  they appear in the successor list.

- We build the histogram in the same way as before.

- We simply use std::max_element on DestPopularity to find the most
  popular destination.  The use of MapVector ensures determinism.

Reviewers: wmi, efriedma

Reviewed By: wmi

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81030
2020-06-02 18:43:31 -07:00
Wei Mi 7a6c89427c [SampleFDO] Add use-sample-profile function attribute.
When sampleFDO is enabled, people may expect they can use
-fno-profile-sample-use to opt-out using sample profile for a certain file.
That could be either for debugging purpose or for performance tuning purpose.
However, when thinlto is enabled, if a function in file A compiled with
-fno-profile-sample-use is imported to another file B compiled with
-fprofile-sample-use, the inlined copy of the function in file B may still
get its profile annotated.

The inconsistency may even introduce profile unused warning because if the
target is not compiled with explicit debug information flag, the function
in file A won't have its debug information enabled (debug information will
be enabled implicitly only when -fprofile-sample-use is used). After it is
imported into file B which is compiled with -fprofile-sample-use, profile
annotation for the outline copy of the function will fail because the
function has no debug information, and that will trigger  profile unused
warning.

We add a new attribute use-sample-profile to control whether a function
will use its sample profile no matter for its outline or inline copies.
That will make the behavior of -fno-profile-sample-use consistent.

Differential Revision: https://reviews.llvm.org/D79959
2020-06-02 17:23:17 -07:00
Hiroshi Yamauchi 089759b96d [PGO] Enable memcmp/bcmp size value profiling.
Summary: Following up D79751.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80578
2020-06-02 10:27:11 -07:00
Florian Hahn b446ec56a2 [LV] Make sure the MaxVF is a power-of-2 by rounding down.
LV currently only supports power of 2 vectorization factors, which has
been made explicit with the assertion added in
840450549c.

However, if the widest type is not a power-of-2 the computed MaxVF won't
be a power-of-2 either. This patch updates computeFeasibleMaxVF to
ensure the returned value is a power-of-2 by rounding down to the
nearest power-of-2.

Fixes PR46139.

Reviewers: Ayal, gilr, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D80870
2020-06-02 10:40:49 +01:00
Denis Antrushin 3c626c714c [EarlyCSE] Common gc.relocate calls.
gc.relocate intrinsic is special in that its second and third operands
are not real values, but indices into relocate's parent statepoint list
of GC pointers.
To be CSE'd, they need special handling in `isEqual()` and `getHashCode()`.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80445
2020-06-02 12:25:43 +03:00
Mircea Trofin 999ea25a9e [llvm][NFC] Cache FAM in InlineAdvisor
Summary:
This simplifies the interface by storing the function analysis manager
with the InlineAdvisor, and, thus, not requiring it be passed each time
we inquire for an advice.

Reviewers: davidxl, asbirlea

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80405
2020-06-01 13:02:34 -07:00
Sanjay Patel 26ebe936f3 [InstCombine] fix use of base VectorType; NFC
SimplifyDemandedVectorElts() bails out on ScalableVectorType
anyway, but we can exit faster with the external check.

Move this to a helper function because there are likely other
vector folds that we can try here.
2020-06-01 14:28:31 -04:00
Hiroshi Yamauchi 6c27c61d32 [PGO] Improve the working set size heuristics under the partial sample PGO.
Summary:
The working set size heuristics (ProfileSummaryInfo::hasHugeWorkingSetSize)
under the partial sample PGO may not be accurate because the profile is partial
and the number of hot profile counters in the ProfileSummary may not reflect the
actual working set size of the program being compiled.

To improve this, the (approximated) ratio of the the number of profile counters
of the program being compiled to the number of profile counters in the partial
sample profile is computed (which is called the partial profile ratio) and the
working set size of the profile is scaled by this ratio to reflect the working
set size of the program being compiled and used for the working set size
heuristics.

The partial profile ratio is approximated based on the number of the basic
blocks in the program and the NumCounts field in the ProfileSummary and computed
through the thin LTO indexing. This means that there is the limitation that the
scaled working set size is available to the thin LTO post link passes only.

Reviewers: davidxl

Subscribers: mgorny, eraman, hiraditya, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79831
2020-06-01 10:29:23 -07:00
Stanislav Mekhanoshin 745c6c8458 Process gep (phi ptr1, ptr2) in SROA
Differential Revision: https://reviews.llvm.org/D79218
2020-06-01 08:41:05 -07:00
Sanjay Patel dd54432a0f [InstNamer] use 'i' for Instructions, not 'tmp'
As discussed in https://bugs.llvm.org/show_bug.cgi?id=45951 and
D80584, the name 'tmp' is almost always a bad choice, but we have
a legacy of regression tests with that name because it was baked
into utils/update_test_checks.py.

This change makes -instnamer more consistent (already using "arg"
and "bb", the common LLVM shorthand). And it avoids the conflict
in telling users of the FileCheck script to run "-instnamer" to
create a better regression test and having that cause a warn/fail
in update_test_checks.py.
2020-06-01 11:11:14 -04:00
Ehud Katz 8a84158e5b [StructurizeCFG] Fix an incorrect comment, NFC. 2020-06-01 17:42:09 +03:00
Ehud Katz 85c3088049 [StructurizeCFG] Fix region nodes ordering
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.

Fix PR41509

Differential Revision: https://reviews.llvm.org/D79037
2020-06-01 12:50:35 +03:00
Whitney Tsang 7873376bb3 [LoopUnroll] Fix build failure for allyesconfig.
Differential Revision: https://reviews.llvm.org/D80477.
2020-05-30 18:32:47 +00:00
zoecarver 065bf124fd [DSE] Remove noop stores in MSSA.
Adds a simple fast-path check for the pattern:
v = load ptr
store v to ptr

I took the tests from the bugzilla post, I can add more if needed (but I think these should be sufficent).

Refs: https://bugs.llvm.org/show_bug.cgi?id=45795

Differential Revision: https://reviews.llvm.org/D79391
2020-05-30 09:57:30 -07:00
Christopher Tetreault e6cf402e83 [SVE] Eliminate calls to default-false VectorType::get() from AggressiveInstCombine
Reviewers: efriedma, aymanmus, c-rhodes, david-arm

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80332
2020-05-29 15:49:33 -07:00
Valery N Dmitriev a45688a72c [SLP] Apply external to vectorizable tree users cost adjustment for
relevant aggregate build instructions only (UserCost).
Users are detected with findBuildAggregate routine and the trick is
that following SLP vectorization may end up vectorizing entire list
with smaller chunks. Cost adjustment then is applied for individual
chunks and these adjustments obviously have to be smaller than the
entire aggregate build cost.

Differential Revision: https://reviews.llvm.org/D80773
2020-05-29 15:37:41 -07:00
Christopher Tetreault 8f8029b458 [SVE] Eliminate calls to default-false VectorType::get() from InstCombine
Reviewers: efriedma, david-arm, fpetrogalli, spatel

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80334
2020-05-29 15:31:31 -07:00
Christopher Tetreault e4d2037a5c [SVE] Eliminate calls to default-false VectorType::get() from Instrumentation
Reviewers: efriedma, fpetrogalli, kmclaughlin

Reviewed By: fpetrogalli

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80335
2020-05-29 15:22:58 -07:00
Christopher Tetreault c8f1aca316 [SVE] Eliminate calls to default-false VectorType::get() from Utils
Reviewers: efriedma, c-rhodes, sdesmalen, xbolva00

Reviewed By: c-rhodes

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80337
2020-05-29 15:01:18 -07:00
Stanislav Mekhanoshin af852d6f36 Revert "Process gep (phi ptr1, ptr2) in SROA"
This reverts commit f66a43c11a.
2020-05-29 13:51:03 -07:00
Stanislav Mekhanoshin f66a43c11a Process gep (phi ptr1, ptr2) in SROA
Differential Revision: https://reviews.llvm.org/D79218
2020-05-29 13:05:51 -07:00
Christopher Tetreault d2befc6633 [SVE] Eliminate calls to default-false VectorType::get() from Vectorize
Reviewers: efriedma, c-rhodes, david-arm, fhahn

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80339
2020-05-29 11:31:24 -07:00
Ehud Katz c710bb44a6 [Local] Prevent `invertCondition` from creating a redundant instruction
Prevent `invertCondition` from creating the inversion instruction, in
case the given value is an argument which has already been inverted.
Note that this approach has already been taken in case the given value
is an instruction (and not an argument).

Differential Revision: https://reviews.llvm.org/D80399
2020-05-29 21:08:22 +03:00
Paul Robinson 8c2d2d971b Preserve DbgLoc when DeadArgumentElimination rewrites a 'ret'.
Fixes PR46002.
2020-05-29 10:00:33 -07:00
Florian Hahn 01f999ae88 [SCCP] Switch to widen at PHIs, stores and call edges.
Currently SCCP does not widen PHIs, stores or along call edges
(arguments/return values), but on operations that directly extend ranges
(like binary operators).

This means PHIs, stores and call edges are not pessimized by widening
currently, while binary operators are. The main reason for widening
operators initially was that opting-out for certain operations was
more straight-forward in the initial implementation (and it did not
matter too much, as range support initially was only implemented for a
very limited set of operations.

During the discussion in D78391, it was suggested to consider flipping
widening to PHIs, stores and along call edges. After adding support for
tracking the number of range extensions in ValueLattice, limiting the
number of range extensions per value is straight forward.

This patch introduces a MaxWidenSteps option to the MergeOptions,
limiting the number of range extensions per value. For PHIs, it seems
natural allow an extension for each (active) incoming value plus 1. For
the other cases, a arbitrary limit of 10 has been chosen initially. It would
potentially make sense to set it depending on the users of a
function/global, but that still needs investigating. This potentially
leads to more state-changes and longer compile-times.

The results look quite promising (MultiSource, SPEC):

Same hash: 179 (filtered out)
Remaining: 58
Metric: sccp.IPNumInstRemoved

Program                                        base    widen-phi diff
 test-suite...ks/Prolangs-C/agrep/agrep.test    58.00   82.00    41.4%
 test-suite...marks/SciMark2-C/scimark2.test    32.00   43.00    34.4%
 test-suite...rks/FreeBench/mason/mason.test     6.00    8.00    33.3%
 test-suite...langs-C/football/football.test   104.00  128.00    23.1%
 test-suite...cations/hexxagon/hexxagon.test    36.00   42.00    16.7%
 test-suite...CFP2000/177.mesa/177.mesa.test   214.00  249.00    16.4%
 test-suite...ngs-C/assembler/assembler.test    14.00   16.00    14.3%
 test-suite...arks/VersaBench/dbms/dbms.test    10.00   11.00    10.0%
 test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   47.00     9.3%
 test-suite...ications/JM/ldecod/ldecod.test   179.00  195.00     8.9%
 test-suite...CFP2006/433.milc/433.milc.test   249.00  265.00     6.4%
 test-suite.../CINT2000/175.vpr/175.vpr.test    98.00  104.00     6.1%
 test-suite...peg2/mpeg2dec/mpeg2decode.test    70.00   74.00     5.7%
 test-suite...CFP2000/188.ammp/188.ammp.test    71.00   75.00     5.6%
 test-suite...ce/Benchmarks/PAQ8p/paq8p.test   111.00  117.00     5.4%
 test-suite...ce/Applications/Burg/burg.test    41.00   43.00     4.9%
 test-suite...000/197.parser/197.parser.test    66.00   69.00     4.5%
 test-suite...tions/lambda-0.1.3/lambda.test    23.00   24.00     4.3%
 test-suite...urce/Applications/lua/lua.test   301.00  313.00     4.0%
 test-suite...TimberWolfMC/timberwolfmc.test    76.00   79.00     3.9%
 test-suite...lications/ClamAV/clamscan.test   991.00  1030.00    3.9%
 test-suite...plications/d/make_dparser.test    53.00   55.00     3.8%
 test-suite...fice-ispell/office-ispell.test    83.00   86.00     3.6%
 test-suite...lications/obsequi/Obsequi.test    28.00   29.00     3.6%
 test-suite.../Prolangs-C/bison/mybison.test    56.00   58.00     3.6%
 test-suite.../CINT2000/254.gap/254.gap.test   170.00  176.00     3.5%
 test-suite.../Applications/lemon/lemon.test    30.00   31.00     3.3%
 test-suite.../CINT2000/176.gcc/176.gcc.test   1202.00 1240.00    3.2%
 test-suite...pplications/treecc/treecc.test    79.00   81.00     2.5%
 test-suite...chmarks/MallocBench/gs/gs.test   357.00  366.00     2.5%
 test-suite...eeBench/analyzer/analyzer.test   103.00  105.00     1.9%
 test-suite...T2006/445.gobmk/445.gobmk.test   1697.00 1724.00    1.6%
 test-suite...006/453.povray/453.povray.test   1812.00 1839.00    1.5%
 test-suite.../Benchmarks/Bullet/bullet.test   337.00  342.00     1.5%
 test-suite.../CINT2000/252.eon/252.eon.test   426.00  432.00     1.4%
 test-suite...T2000/300.twolf/300.twolf.test   214.00  217.00     1.4%
 test-suite...pplications/oggenc/oggenc.test   244.00  247.00     1.2%
 test-suite.../CINT2006/403.gcc/403.gcc.test   4008.00 4055.00    1.2%
 test-suite...T2006/456.hmmer/456.hmmer.test   175.00  177.00     1.1%
 test-suite...nal/skidmarks10/skidmarks.test   430.00  434.00     0.9%
 test-suite.../Applications/sgefa/sgefa.test   115.00  116.00     0.9%
 test-suite...006/447.dealII/447.dealII.test   1082.00 1091.00    0.8%
 test-suite...6/482.sphinx3/482.sphinx3.test   141.00  142.00     0.7%
 test-suite...ocBench/espresso/espresso.test   152.00  153.00     0.7%
 test-suite...3.xalancbmk/483.xalancbmk.test   4003.00 4025.00    0.5%
 test-suite...lications/sqlite3/sqlite3.test   548.00  551.00     0.5%
 test-suite...marks/7zip/7zip-benchmark.test   5522.00 5551.00    0.5%
 test-suite...nsumer-lame/consumer-lame.test   208.00  209.00     0.5%
 test-suite...:: External/Povray/povray.test   1556.00 1563.00    0.4%
 test-suite...000/186.crafty/186.crafty.test   298.00  299.00     0.3%
 test-suite.../Applications/SPASS/SPASS.test   2019.00 2025.00    0.3%
 test-suite...ications/JM/lencod/lencod.test   8427.00 8449.00    0.3%
 test-suite...6/464.h264ref/464.h264ref.test   6797.00 6813.00    0.2%
 test-suite...6/471.omnetpp/471.omnetpp.test   431.00  430.00    -0.2%
 test-suite...006/450.soplex/450.soplex.test   446.00  447.00     0.2%
 test-suite...0.perlbench/400.perlbench.test   1729.00 1727.00   -0.1%
 test-suite...000/255.vortex/255.vortex.test   3815.00 3819.00    0.1%

Reviewers: efriedma, nikic, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79036
2020-05-29 11:59:17 +01:00
David Sherwood f254f1d94e [SVE] Remove getNumElements() warnings in InstCombiner::visitBitCast
Whilst trying to compile this test to assembly:

  CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c

I discovered some warnings were firing in InstCombiner::visitBitCast
due to calls to getNumElements() for scalable vector types. These
calls only really made sense for fixed width vectors so I have fixed
up the code appropriately.

Differential Revision: https://reviews.llvm.org/D80559
2020-05-29 08:00:08 +01:00
Whitney Tsang 4e74541a92 [LoopUnroll] Fix not-rotated.ll by adding back a limitation was unintentionally
removed in https://reviews.llvm.org/D80477
2020-05-29 03:05:58 +00:00
Whitney Tsang 1bc73b02d6 [LoopUnroll] Support loops with exiting block that is neither header nor
latch.

Summary: Remove the limitation in LoopUnrollPass that exiting block must
be either header or latch.
Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto,
fhahn, efriedma
Reviewed By: etiotto, fhahn, efriedma
Subscribers: efriedma, lkail, xbolva00, hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80477
2020-05-29 01:18:38 +00:00
Philip Reames a0d2fd4a1f [Statepoint] Sink actual_args and gc_args to GCStatepointInst [NFC]
These are the two operand sets which are expected to survive more than another week or so.  Instead of bothering to update the deopt and gc-transition operands, we'll just wait until those are removed and delete the code.

For those following along, this is likely to be the last (major) change in this sequence for about a week.  I want to wait until all of this has been merged downstream to ensure I haven't introduced any bugs (and migrate some downstream code to the new interfaces).  Once that's done, we should be able to delete Statepoint/ImmutableStatepoint without too much work.
2020-05-28 13:51:59 -07:00
aartbik f719e7d9e7 [llvm] [MatrixIntrinsics] Add row-major support for llvm.matrix.transpose
Summary:
Only column-major was supported so far. This adds row-major support as well.
Note that we probably also want very efficient SIMD implementations for the
various target platforms.

Bug:
https://bugs.llvm.org/show_bug.cgi?id=46085

Reviewers: nicolasvasilache, reidtatge, bkramer, fhahn, ftynse, andydavis1, craig.topper, dcaballe, mehdi_amini, anemet

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80673
2020-05-28 12:13:32 -07:00
Whitney Tsang 47ffc81830 Revert "[LoopUnroll] Support loops with exiting block that is neither header nor"
This reverts commit 2810582265.

Revert until
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/7334
is resolved.
2020-05-28 19:10:27 +00:00
Whitney Tsang 2810582265 [LoopUnroll] Support loops with exiting block that is neither header nor
latch.

Summary: Remove the limitation in LoopUnrollPass that exiting block must
be either header or latch.
Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto,
fhahn, efriedma
Reviewed By: etiotto, fhahn, efriedma
Subscribers: efriedma, lkail, xbolva00, hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80477
2020-05-28 18:27:09 +00:00
Philip Reames 587fa99cfd Default to generating statepoints with deopt and gc-transition bundles if needed
Continues from D80598.

The key point of the change is to default to using operand bundles instead of the inline length prefix argument lists for statepoint nodes. An important subtlety to note is that the presence of a bundle has semantic meaning, even if it is empty. As such, we need to make a somewhat deeper change to the interface than is first obvious.

Existing code treats statepoint deopt arguments and the deopt bundle operands differently during inlining. The former is ignored (resulting in caller state being dropped), the later is merged.

We can't preserve the old behaviour for calls with deopt fed to RS4GC and then inlining, but we can avoid the no-deopt case changing. At least in internal testing, that seem to be the important one. (I'd argue the "stop merging after RS4GC" behaviour for the former was always "unexpected", but that the behaviour for non-deopt calls actually make sense.)

Differential Revision: https://reviews.llvm.org/D80674
2020-05-28 10:14:23 -07:00
Hiroshi Yamauchi f0c2cfe4d0 [PGO] Guard the memcmp/bcmp size value profiling instrumentation behind flag.
Summary:
Follow up D79751 and put the instrumentation / value collection side (in
addition to the optimization side) behind the flag as well.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80646
2020-05-28 10:07:04 -07:00
Sidharth Baveja 15b6730f07 Create utility function to Merge Adjacent Basic Blocks
Summary: The following code from
/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp can be used by other
transformations:

while (!MergeBlocks.empty()) {
    BasicBlock *BB = *MergeBlocks.begin();
    BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator());
    if (Term && Term->isUnconditional() &&
L->contains(Term->getSuccessor(0))) {
      BasicBlock *Dest = Term->getSuccessor(0);
      BasicBlock *Fold = Dest->getUniquePredecessor();
      if (MergeBlockIntoPredecessor(Dest, &DTU, LI)) {
        // Don't remove BB and add Fold as they are the same BB
        assert(Fold == BB);
        (void)Fold;
        MergeBlocks.erase(Dest);
      } else
        MergeBlocks.erase(BB);
    } else
      MergeBlocks.erase(BB);
  }
Hence it should be separated into its own utility function.

Authored By: sidbav
Reviewer: Whitney, Meinersbur, asbirlea, dmgreen, etiotto
Reviewed By: asbirlea
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80583
2020-05-28 16:44:37 +00:00
Matt Arsenault d6671ee90c InferAddressSpaces: Handle ptrmask intrinsic
This one is slightly odd since it counts as an address expression,
which previously could never fail. Allow the existing TTI hook to
return the value to use, and re-use it for handling how to handle
ptrmask.

Handles the no-op addrspacecasts for AMDGPU. We could probably do
something better based on analysis of the mask value based on the
address space, but leave that for now.
2020-05-28 10:04:02 -04:00
Kazu Hirata c4990a03c6 [JumpThreading] Use emplace_back instead of push_back (NFC)
Summary: This patch replaces push_back with emplace_back where appropriate.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80688
2020-05-27 22:31:23 -07:00
Philip Reames 87bea912c2 [Statepoint] Replace uses of isX functions with idiomatic isa<X>
Now that all of the statepoint related routines have classes with isa support, let's cleanup.

I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
2020-05-27 18:32:28 -07:00
Layton Kifer 2bf3fe9b6d [TRE] Allow elimination when the returned value is non-constant
Currently we can only eliminate call return pairs that either return the
result of the call or a dynamic constant. This patch removes that
limitation.

Differential Revision: https://reviews.llvm.org/D79660
2020-05-27 16:55:03 -07:00
Mircea Trofin fa3b587196 [llvm]NFC] Simplify ProfileSummaryInfo state transitions
ProfileSummaryInfo is updated seldom, as result of very specific
triggers. This patch clearly demarcates state updates from read-only uses.
This, arguably, improves readability and maintainability.
2020-05-27 11:58:37 -07:00
Rithik Sharma eadf295956 [CodeMoverUtils] Use dominator tree level to decide the direction of
code motion

Summary: Currently isSafeToMoveBefore uses DFS numbering for determining
the relative position of instruction and insert point which is not
always correct. This PR proposes the use of Dominator Tree depth for the
same. If a node is at a higher level than the insert point then it is
safe to say that we want to move in the forward direction.
Authored By: RithikSharma
Reviewer: Whitney, nikic, bmahjour, etiotto, fhahn
Reviewed By: Whitney
Subscribers: fhahn, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80084
2020-05-27 18:02:06 +00:00
David Green 70d4a20299 [UnJ] Update LI for inner nested loops
This makes sure to correctly register the loop info of the children
of unroll and jammed loops. It re-uses some code from the unroller for
registering subloops.

Differential Revision: https://reviews.llvm.org/D80619
2020-05-27 14:36:38 +01:00
Florian Hahn 9b507b2127 [LAA] We only need pointer checks if there are non-zero checks (NFC).
If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969

Note: This is a recommit of 259abfc7cb,
with some suggested renaming.
2020-05-27 12:47:36 +01:00
Florian Hahn 2d0389821e Revert "[LAA] We only need pointer checks if there are non-zero checks (NFC)."
This reverts commit 259abfc7cb.

Reverting this, as I missed a case where we return without setting
RtCheck.Need.
2020-05-27 12:39:45 +01:00
Florian Hahn 259abfc7cb [LAA] We only need pointer checks if there are non-zero checks (NFC).
If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969
2020-05-27 12:37:20 +01:00
Daniil Suchkov 706b22e3e4 [SimpleLoopUnswitch] Drop uses of instructions before block deletion
Currently if instructions defined in a block are used in unreachable
blocks and SimpleLoopUnswitch attempts deleting the block, it triggers
assertion "Uses remain when a value is destroyed!".
This patch fixes it by replacing all uses of instructions from BB with
undefs before BB deletion.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D80551
2020-05-27 18:25:18 +07:00
Simon Pilgrim 35963f6d85 VPlanValue.h - reduce unnecessary includes to forward declarations. NFC. 2020-05-27 11:26:14 +01:00
Djordje Todorovic 65030821d4 [NFC][Debugify] Format the CheckModuleDebugify output
This fixes the output of the check-debugify option.
Without the patch an example of running the option:

$ opt -check-debugify test.ll -S -o testDebugify.ll
CheckModuleDebugifySkipping module without debugify metadata

After the patch:

$ opt -check-debugify test.ll -S -o testDebugify.ll
CheckModuleDebugify: Skipping module without debugify metadata

Differential Revision: https://reviews.llvm.org/D80553
2020-05-27 10:32:40 +02:00
Florian Hahn 5cf90d6cf1 [LoopUnroll] Simplify latch/header block handling (NFC).
I think the current code dealing with connecting the unrolled iterations
is a bit more complicated than necessary currently. To connect the
unrolled iterations, we have to update the unrolled latch blocks to
branch to the header of the next unrolled iteration.

We need to do this regardless whether the latch is exiting or not.

Additionally, we try to turn the conditional branch in the exiting block
to an unconditional one. This is an optimization only; alternatively we
could leave the conditional branches in place and rely on other passes
to simplify the conditions.

Logically, this is a separate step from connecting the latches to the
headers, but it is convenient to fold them into the same loop, if the
latch is also exiting. For headers (or other non-latch exiting blocks,
this is done separately).

Hopefully the patch with additional comments makes things a bit clearer.

Reviewers: efriedma, dmgreen, hfinkel, Whitney

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D80544
2020-05-26 21:54:12 +01:00
Stanislav Mekhanoshin 42725aeed8 Process gep (select ptr1, ptr2) in SROA
Differential Revision: https://reviews.llvm.org/D79217
2020-05-26 12:56:02 -07:00
Sanjay Patel 1a2bffaf8b [InstCombine] reassociate sub+add to increase adds and throughput
The -reassociate pass tends to transform this kind of pattern into
something that is worse for vectorization and codegen. See PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953

Follows-up the FP version of the same transform:
rGa0ce2338a083
2020-05-26 14:49:17 -04:00
Hiroshi Yamauchi 106ec64fbc [PGO] Add memcmp/bcmp size value profiling.
Summary: This adds support for memcmp/bcmp to the existing memcpy/memset value profiling.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79751
2020-05-26 10:28:04 -07:00
Sanjay Patel a0ce2338a0 [InstCombine] reassociate fsub+fadd with FMF to increase adds and throughput
The -reassociate pass tends to transform this kind of pattern into
something that is worse for vectorization and codegen. See PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
2020-05-26 13:17:15 -04:00
Serge Pavlov 4d20e31f73 [FPEnv] Intrinsic llvm.roundeven
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.

Differential Revision: https://reviews.llvm.org/D75670
2020-05-26 19:24:58 +07:00
Yi Kong c1c9eb0ab7 [Transforms] Check validity of profile reader before invoking it
Although an invalid sampling profile would fail the compilation anyway,
this avoids crashing the compiler.
2020-05-26 20:11:24 +08:00
Sam Parker 871556a494 [CostModel] Unify Intrinsic Costs.
Recommitting most of the remaining changes from
259eb619ff, but excluding the call to
getUserCost from getInstructionThroughput. Though there's still no
test changes, I doubt that this is an NFC...

With the two getIntrinsicInstrCosts folded into one, now fold in the
scalar/code-size orientated getIntrinsicCost. The remaining scalar
intrinsics were memcpy, cttz and ctlz which now have special handling
in the BasicTTI implementation.

This had required a change in the AMDGPU backend for fabs as it
should always be 'free'. I've also changed the X86 backend to return
the BaseT implementation when the CostKind isn't RecipThroughput.

Differential Revision: https://reviews.llvm.org/D80012
2020-05-26 09:48:26 +01:00
Florian Hahn 179c80117c [LoopUnroll] Remove dead NextBlocks argument (NFC). 2020-05-25 22:09:11 +01:00
Marek Kurdej bc93c2d72e [Transforms] Fix typos. NFC 2020-05-25 22:34:08 +02:00
Whitney Tsang 5d6c5b463c [LoopUtils] Use llvm::find
Summary: Fixes this build error:

llvm/lib/Transforms/Utils/LoopUtils.cpp:679:26: error: no matching
function for call to 'find'
      Loop::iterator I = find(ParentLoop->begin(), ParentLoop->end(),
L);
                         ^~~~
Authored By: orivej
Reviewer: Whitney
Reviewed By: Whitney
Subscribers: hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80473
2020-05-25 13:34:56 +00:00
Simon Pilgrim 8b4ecafee6 InstructionSimplify.h - remove unnecessary includes. NFC.
Remove unused User.h include.
Replace SetVector.h with forward declaration.
Sort the forward declarations + remove FastMathFlags (defined in Operator.h).
Fix implicit SetVector.h dependency in LowerConstantIntrinsics.cpp.
2020-05-25 13:45:03 +01:00
Ayal Zaks 840450549c [LV] Clamp MaxVF to power of 2.
If a loop has a constant trip count known to be a multiple of MaxVF (times user
UF), LV infers that no tail will be generated for any chosen VF. This relies on
the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power
of 2. Make sure the latter holds, in particular when MaxVF is set by a memory
dependence distance which may not be a power of 2.

Differential Revision: https://reviews.llvm.org/D80491
2020-05-25 11:24:33 +03:00
Sanjay Patel 57bb4787d7 [Pass Manager] remove EarlyCSE as clean-up for VectorCombine
EarlyCSE was added with D75145, but the motivating test is
not regressed by removing the extra pass now. That might be
because VectorCombine altered the way it processes instructions,
or it might be from (re)moving VectorCombine in the pipeline.

The extra round of EarlyCSE appears to cost approximately
0.26% in compile-time as discussed in D80236, so we need some
evidence to justify its inclusion here, but we do not have
that (yet).

I suspect that between SLP and VectorCombine, we are creating
patterns that InstCombine and/or codegen are not prepared for,
but we will need to reduce those examples and include them as
PhaseOrdering and/or test-suite benchmarks.
2020-05-24 12:36:21 -04:00
Florian Hahn 0deab8a54f [LV] Either get invariant condition OR vector condition.
Currently we unconditionally get the first lane of the condition
operand, even if we later use the full vector condition. This can result
in some unnecessary instructions being generated.

Suggested as follow-up in D80219.
2020-05-24 17:16:42 +01:00
Sanjay Patel c048a02b5b [InstCombine] fold FP trunc into exact itofp
Similar to D79116 and rGbfd512160fe0 - if the 1st cast
is exact, then we can go directly to the destination
type because there is no double-rounding.
2020-05-24 09:30:19 -04:00
Sanjay Patel 7eed772a27 [PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.
2020-05-24 09:19:47 -04:00
Florian Hahn 15224408f0 [VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC).
VPWidenSelectRecipe already contains a VPUser, but it is not used. This
patch updates the code related to VPWidenSelectRecipe to use VPUser for
its operands.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80219
2020-05-24 13:58:08 +01:00
Matt Arsenault cdd006eec9 SimplifyCFG: Clean up optforfuzzing implementation
This should function as any other SimplifyCFGOption rather than having
the transform check and specially consider the attribute itself.
2020-05-23 13:49:50 -04:00
Matt Arsenault 27fe841aa6 AMDGPU: Refine rcp/rsq intrinsic folding for modern FP rules
We have to assume undef could be an snan, which would need quieting so
returning qnan is safer than undef. Also consider strictfp, and don't
care if the result rounded.
2020-05-23 13:28:36 -04:00
Michal Paszkowski 335de55fa3 Revert "Added a new IRCanonicalizer pass."
This reverts commit 14d358537f.
2020-05-23 13:51:43 +02:00
Michal Paszkowski 14d358537f Added a new IRCanonicalizer pass.
Summary:
Added a new IRCanonicalizer pass which aims to transform LLVM modules into
a canonical form by reordering and renaming instructions while preserving the
same semantics. The canonicalizer makes it easier to spot semantic differences
when diffing two modules which have undergone different passes.

Presentation: https://www.youtube.com/watch?v=c9WMijSOEUg

Reviewed by: plotfi

Differential Revision: https://reviews.llvm.org/D66029
2020-05-23 12:45:53 +02:00
Craig Topper 7392820f98 [Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.

I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.

I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.

Differential Revision: https://reviews.llvm.org/D80455
2020-05-22 21:54:28 -07:00
Sanjay Patel 024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00
Sanjay Patel 6438ea45e0 [VectorCombine] position pass after SLP in the optimization pipeline rather than before
There are 2 known problem patterns shown in the test diffs here:
vector horizontal ops (an x86 specialization) and vector reductions.

SLP has greater ability to match and fold those than vector-combine,
so let SLP have first chance at that.

This is a quick fix while we continue to improve vector-combine and
possibly canonicalize to reduction intrinsics.

In the longer term, we should improve matching of these patterns
because if they were created in the "bad" forms shown here, then we
would miss optimizing them.

I'm not sure what is happening with alias analysis on the addsub test.
The old pass manager now shows an extra line for that, and we see an
improvement that comes from SLP vectorizing a store. I don't know
what's missing with the new pass manager to make that happen.
Strangely, I can't reproduce the behavior if I compile from C++ with
clang and invoke the new PM with "-fexperimental-new-pass-manager".

Differential Revision: https://reviews.llvm.org/D80236
2020-05-22 12:22:44 -04:00
Sanjay Patel 2f7c24fe30 [InstCombine] (A + B) + B --> A + (B << 1)
This eliminates a use of 'B', so it can enable follow-on transforms
as well as improve analysis/codegen.

The PhaseOrdering test was added for D61726, and that shows
the limits of instcombine vs. real reassociation. We would
need to run some form of CSE to collapse that further.

The intermediate variable naming here is intentional because
there's a test at llvm/test/Bitcode/value-with-long-name.ll
that would break with the usual nameless value. I'm not sure
how to improve that test to be more robust.

The naming may also be helpful to debug regressions if this
change exposes weaknesses in the reassociation pass for example.
2020-05-22 11:46:59 -04:00
Anh Tuyen Tran 13bf6039c9 Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1
Summary:
When handling loops whose VF is 1, fold-tail vectorization sets the
backedge taken count of the original loop with a vector of a single
element. This causes type-mismatch during instruction generartion.

The purpose of this patch is toto address the case of VF==1.

Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin)

Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn)

Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits)

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D79976
2020-05-22 13:30:56 +00:00
Sanjay Patel 21f7cf4057 [SLP] fix verification check for valid IR
This is a fix for PR45965 - https://bugs.llvm.org/show_bug.cgi?id=45965 -
which was left out of D80106 because of a test failure.

SLP does its own mini-CSE after potentially creating redundant instructions,
so we need to wait for that to complete before running the verifier.
Otherwise, we will see a test failure for
test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll (not changed here)
because a phi temporarily has identical but different incoming values for
the same incoming block.

A related, but independent, test that would have been altered here was
fixed with:
rG880df55

The test was escaping verification in SLP without this change because we
were not running verifyFunction() unless SLP actually changed the IR.

Differential Revision: https://reviews.llvm.org/D80401
2020-05-22 09:15:27 -04:00
Matt Arsenault 88c20fa3d2 InstCombine: Add constant folding/simplify for amdgcn.ldexp intrinsic
This really belongs in InstructionSimplify since it doesn't introduce
new instructions. Put it in instcombine to avoid increasing the number
of passes considering target intrinsics.

I also noticed that we seem to now be interpreting strictfp attributes
on call sites, so try to handle that.
2020-05-22 08:21:38 -04:00
Roman Lebedev cd921accf9
[NFC] InstCombineNegator: use auto where type is obvious from the cast 2020-05-22 11:14:54 +03:00
Max Kazantsev 403810557b [InstCombine] Sink pure instructions down to return and unreachable blocks
If the only user of `Instr` is in a return or unreachable block, we can
sink `Instr` to the`User` safely (unless it reads/writes memory).
Return or unreachable blocks are guaranteed to execute zero
or one time, and `Instr` always dominates `User`, so they either will
be executed together (execution of `User` always implies execution
of `Instr`) or not executed at all.

Differential Revision: https://reviews.llvm.org/D80120
Reviewed By: asbirlea, jdoerfert
2020-05-22 14:33:42 +07:00
Dinar Temirbulatov df3b95bc0a [SLP][NFC] PR45269 getVectorElementSize() is slow
The algorithm inside getVectorElementSize() is almost O(x^2) complexity and
when, for example, we compile MultiSource/Applications/ClamAV/shared_sha256.c
with 1k instructions inside sha256_transform() function that resulted in almost
~800k iterations. The following change improves the algorithm with the map to
a liner complexity.

Differential Revision: https://reviews.llvm.org/D80241
2020-05-21 17:26:50 +02:00
Sam Parker 259eb619ff Revert "[CostModel] Unify Intrinsic Costs."
This reverts commit de71def3f5.

This is causing some very large changes, so I'm first going to break
this patch down and re-commit in parts.
2020-05-21 12:50:24 +01:00
Ehud Katz 111ddc57d3 [FlattenCFG] Fix `MergeIfRegion` in case then-path is empty
In case the then-path of an if-region is empty, then merging with the
else-path should be handled with the inverse of the condition (leading
to that path).

Fix PR37662

Differential Revision: https://reviews.llvm.org/D78881
2020-05-21 14:06:44 +03:00
Roman Lebedev b2df961231
[IndVarSimplify][LoopUtils] Avoid TOCTOU/ordering issues (PR45835)
Summary:
Currently, `rewriteLoopExitValues()`'s logic is roughly as following:
> Loop over each incoming value in each PHI node.
> Query whether the SCEV for that incoming value is high-cost.
> Expand the SCEV.
> Perform sanity check (`isValidRewrite()`, D51582)
> Record the info
> Afterwards, see if we can drop the loop given replacements.
> Maybe perform replacements.

The problem is that we interleave SCEV cost checking and expansion.
This is A Problem, because `isHighCostExpansion()` takes special care
to not bill for the expansions that were already expanded, and we can reuse.

While it makes sense in general - if we know that we will expand some SCEV,
all the other SCEV's costs should account for that, which might cause
some of them to become non-high-cost too, and cause chain reaction.

But that isn't what we are doing here. We expand *all* SCEV's, unconditionally.
So every next SCEV's cost will be affected by the already-performed expansions
for previous SCEV's. Even if we are not planning on keeping
some of the expansions we performed.

Worse yet, this current "bonus" depends on the exact PHI node
incoming value processing order. This is completely wrong.

As an example of an issue, see @dmajor's `pr45835.ll` - if we happen to have
a PHI node with two(!) identical high-cost incoming values for the same basic blocks,
we would decide first time around that it is high-cost, expand it,
and immediately decide that it is not high-cost because we have an expansion
that we could reuse (because we expanded it right before, temporarily),
and replace the second incoming value but not the first one;
thus resulting in a broken PHI.

What we instead should do for now, is not perform any expansions
until after we've queried all the costs.

Later, in particular after `isValidRewrite()` is an assertion (D51582)
we could improve upon that, but in a more coherent fashion.

See [[ https://bugs.llvm.org/show_bug.cgi?id=45835 | PR45835 ]]

Reviewers: dmajor, reames, mkazantsev, fhahn, efriedma

Reviewed By: dmajor, mkazantsev

Subscribers: smeenai, nikic, hiraditya, javed.absar, llvm-commits, dmajor

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79787
2020-05-21 13:05:55 +03:00
Benjamin Kramer 5b0d1f04bf Fix a layering violation by not depending from Transforms/Utils on Transforms/Scalar.
NFC.
2020-05-21 09:51:58 +02:00
Sam Parker de71def3f5 [CostModel] Unify Intrinsic Costs.
With the two getIntrinsicInstrCosts folded into one, now fold in the
scalar/code-size orientated getIntrinsicCost. This involved sinking
cost of the TTIImpl into the base implementation, as it performs no
target checks. The opcodes remaining were memcpy, cttz and ctlz which
now have special handling in the BasicTTI implementation.
getInstructionThroughput can now directly return the result of
getUserCost.

This had required a change in the AMDGPU backend for fabs and its
always 'free'. I've also changed the X86 backend to return '1' for
any intrinsic when the CostKind isn't RecipThroughput.

Though this intended to be a non-functional change, there are many
paths being combined here so I would be very surprised if this didn't
have an effect.

Differential Revision: https://reviews.llvm.org/D80012
2020-05-21 07:38:25 +01:00
Yevgeny Rouban 8138487468 [BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights()
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.

Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
2020-05-21 12:52:37 +07:00
Juneyoung Lee d9a4a24413 Add CanonicalizeFreezeInLoops pass
Summary:
If an induction variable is frozen and used, SCEV yields imprecise result
because it doesn't say anything about frozen variables.

Due to this reason, performance degradation happened after
https://reviews.llvm.org/D76483 is merged, causing
SCEV yield imprecise result and preventing LSR to optimize a loop.

The suggested solution here is to add a pass which canonicalizes frozen variables
inside a loop. To be specific, it pushes freezes out of the loop by freezing
the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
This solution was also mentioned at https://reviews.llvm.org/D70623 .

Reviewers: spatel, efriedma, lebedev.ri, fhahn, jdoerfert

Reviewed By: fhahn

Subscribers: nikic, mgorny, hiraditya, javed.absar, llvm-commits, sanwou01, nlopes

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77523
2020-05-21 09:29:29 +09:00
Eli Friedman f26bdb539e Make Value::getPointerAlignment() return an Align, not a MaybeAlign.
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072
2020-05-20 16:37:20 -07:00
Roman Lebedev 55430f53f3
[InstCombine] `insertelement` is negatible if both sources are negatible
----------------------------------------
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %src
  %t1 = sub i4 0, %a
  %t2 = insertelement <2 x i4> %t0, i4 %t1, i32 %x
  %t3 = sub <2 x i4> %b, %t2
  ret <2 x i4> %t3
}
=>
define <2 x i4> @negate_insertelement(<2 x i4> %src, i4 %a, i32 %x, <2 x i4> %b) {
%0:
  %t2.neg = insertelement <2 x i4> %src, i4 %a, i32 %x
  %t3 = add <2 x i4> %t2.neg, %b
  ret <2 x i4> %t3
}
Transformation seems to be correct!
2020-05-20 21:44:31 +03:00
Roman Lebedev ebed96fdbf
[InstCombine] Negator: `extractelement` is negatible if src is negatible
----------------------------------------
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1 = extractelement <2 x i4> %t0, i32 %y
  %t2 = sub i4 %z, %t1
  ret i4 %t2
}
=>
define i4 @negate_extractelement(<2 x i4> %x, i32 %y, i4 %z) {
%0:
  %t0 = sub <2 x i4> { 0, 0 }, %x
  call void @use_v2i4(<2 x i4> %t0)
  %t1.neg = extractelement <2 x i4> %x, i32 %y
  %t2 = add i4 %t1.neg, %z
  ret i4 %t2
}
Transformation seems to be correct!
2020-05-20 21:44:31 +03:00
Arthur Eubanks 8a88755610 Reland [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 11:25:44 -07:00
Arthur Eubanks b8cbff51d3 Revert "[X86] Codegen for preallocated"
This reverts commit 810567dc69.

Some tests are unexpectedly passing
2020-05-20 10:04:55 -07:00
Arthur Eubanks 810567dc69 [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 09:20:38 -07:00
Sam Parker 8cc911fa5b [NFCI][CostModel] Refactor getIntrinsicInstrCost
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.

Differential Revision: https://reviews.llvm.org/D79941
2020-05-20 11:59:08 +01:00
Florian Hahn bcbd26bfe6 [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-05-20 10:53:40 +01:00
Benjamin Kramer 350dadaa8a Give helpers internal linkage. NFC. 2020-05-19 22:16:37 +02:00
Nikita Popov 5fae613a4f [LVI] Don't require DominatorTree in LVI (NFC)
After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.

Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.

Differential Revision: https://reviews.llvm.org/D76985
2020-05-19 20:21:46 +02:00
Florian Hahn 7cefd1b4cd [LV] Remove duplicated return stmt (NFC). 2020-05-19 17:20:50 +01:00
Jay Foad 9bc989a48d [InstCombine] Remove hasNoInfs check for pow(C,y) -> exp2(log2(C)*y)
We already check hasNoNaNs and that x is finite and strictly positive.
That only leaves the following special cases (taken from the Linux man
page for pow):

If x is +1, the result is 1.0 (even if y is a NaN).
If the absolute value of x is less than 1, and y is negative infinity, the result is positive infinity.
If the absolute value of x is greater than 1, and y is negative infinity, the result is +0.
If the absolute value of x is less than 1, and y is positive infinity, the result is +0.
If the absolute value of x is greater than 1, and y is positive infinity, the result is positive infinity.

The first case is handled elsewhere, and this transformation preserves
all the others, so there is no need to limit it to hasNoInfs.

Differential Revision: https://reviews.llvm.org/D79409
2020-05-19 17:06:05 +01:00
Florian Hahn cff9399f6b [VPlan] Fix comment for User in VPWidenSelectRecipe (NFC).
The comment was referring the arguments of the call, but the recipe
widens a select.
2020-05-19 15:31:39 +01:00
Florian Hahn f828d75b46 [VPlan] Add & use VPValue operands for VPReplicateRecipe (NFC).
This patch adds VPValue version of the instruction operands to
VPReplicateRecipe and uses them during code-generation.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80114
2020-05-19 15:12:17 +01:00
Florian Hahn 66ad107452 [VPlan] Remove unique_ptr from VPBranchOnRecipeMask (NFC).
We can remove a dynamic memory allocation, by checking the number of
operands: no operands = all true, 1 operand = mask.

Reviewers: Ayal, gilr, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D80110
2020-05-19 15:01:37 +01:00
Sameer Sahasrabuddhe 6c84884366 [LoopSimplify] don't separate nested loops with convergent calls
Summary:
When a loop has multiple backedges, loop simplification attempts to
separate them out into nested loops. This results in incorrect control
flow in the presence of some functions like a GPU barrier. This change
skips the transformation when such "convergent" function calls are
present in the loop body.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D80078
2020-05-19 09:22:39 +05:30
Eli Friedman 27b4e6931d [NFC] Replace MaybeAlign with Align in TargetTransformInfo. 2020-05-18 19:25:49 -07:00
Ayal Zaks 682e739638 [LV] Fix FoldTail under user VF and UF
LV considers an internally computed MaxVF to decide if a constant trip-count is
a multiple of any subsequently chosen VF, and conclude that no scalar remainder
iterations (tail) will be left for Fold Tail to handle. If an external VF is
provided via -force-vector-width, it must be considered instead of the internal
MaxVF.
If an external UF is provided via -force-vector-interleave, it too must be
considered in addition to MaxVF or user VF.

Fixes PR45679.

Differential Revision: https://reviews.llvm.org/D80085
2020-05-19 01:32:25 +03:00
Craig Topper c9f63297e2 Fix several places that were calling verifyFunction or verifyModule without checking the return value.
verifyFunction/verifyModule don't assert or error internally. They
also don't print anything if you don't pass a raw_ostream to them.
So the caller needs to check the result and ideally pass a stream
to get the messages. Otherwise they're just really expensive no-ops.

I've filed PR45965 for another instance in SLPVectorizer
that causes a lit test failure.

Differential Revision: https://reviews.llvm.org/D80106
2020-05-18 13:28:46 -07:00
Nikita Popov 47a0e9f49b [Sanitizers] Use getParamByValType() (NFC)
Instead of fetching the pointer element type.
2020-05-18 22:06:18 +02:00
Volkan Keles 63081dc6f6 LoadStoreVectorizer: Match nested adds to prove vectorization is safe
If both OpA and OpB is an add with NSW/NUW and with the same LHS operand,
we can guarantee that the transformation is safe if we can prove that OpA
won't overflow when IdxDiff added to the RHS of OpA.

Review: https://reviews.llvm.org/D79817
2020-05-18 12:13:01 -07:00
Nikita Popov 736db2f710 [Loads] Require Align in isSafeToLoadUnconditionally() (NFC)
Now that load/store have required alignment, accept Align here.
This also avoids uses of getPointerElementType(), which is
incompatible with opaque pointers.
2020-05-18 20:50:35 +02:00