Commit Graph

11485 Commits

Author SHA1 Message Date
zhongyunde b2f5164deb [IVDescriptors] Support FOR where we have multiple sink pointed
Handles the case where Previous doesn't come before LastPrev incorrectly.
Fix https://github.com/llvm/llvm-project/issues/53483

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D118558
2022-02-14 09:30:35 +08:00
Arthur Eubanks a9029a33ff [OpaquePtr][ValueTracking] Check GEP source element type in isPointerOffset()
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119652
2022-02-13 10:35:38 -08:00
Philip Reames c02deae18c [SCEVPredicate] Remove getExpr mechanism [NFC]
This mechanism was used for a couple of purposes, but the primary one was keeping track of which predicates in a union might apply to an expression.  As these sets are small and agressively deduped, this has little value.
2022-02-11 11:35:58 -08:00
Roman Lebedev 97484f46eb
[NFCI][SCEV] `SCEVTraversal`: if search terminated, don't push further ops of nary
Even if the search is marked as terminated after only looking at
the first operand, we'd still look at the remaining operands
before actually ending the search.

This seems pointless and wasteful, let's not do that.
2022-02-11 21:58:19 +03:00
Roman Lebedev 65715ac72a
[SCEV] Generalize umin_seq matching
Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`,
just looking at the operands of the outer-level `umin` is not sufficient,
and we need to recurse into all same-typed `umin`'s.
2022-02-11 21:58:19 +03:00
Roman Lebedev c234809ff8
[SCEV] Recognize `x == 0 ? 0 : umin_seq(..., x, ...) -> umin_seq(x, umin_seq(...))` 2022-02-11 21:58:19 +03:00
Roman Lebedev 281421693b
[SCEV] Recognize `x == 0 ? 0 : umin(..., x, ...) -> umin_seq(x, umin(...))`
That is the canonical expansion for umin_seq,
so we really should roundtrip it.
2022-02-11 21:58:19 +03:00
Roman Lebedev 4d0c0e6cc2
[SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: generalize eq handling
The current logic was: https://alive2.llvm.org/ce/z/j8muXk
but in reality the offset to the Y in the 'true' hand
does not need to exist: https://alive2.llvm.org/ce/z/MNQ7DZ
https://alive2.llvm.org/ce/z/S2pMQD

To catch that, instead of computing the Y's in both
hands and checking their equality, compute Y and C,
and check that C is 0 or 1.
2022-02-11 21:58:19 +03:00
Roman Lebedev a473c457f6
[NFC][SCEV] `createNodeForSelectOrPHIInstWithICmpInstCond()`: dedup eq/ne pred handling 2022-02-11 21:58:19 +03:00
Philip Reames 3e27fb8590 [PSE] Allow duplicate predicates in debug output
This lets us avoid redundant implication work in the constructor of SCEVUnionPredicate which simplifies an upcoming change.  If we're actually building a predicate via PSE, that goes through addPredicate which does include the implication check.
2022-02-11 10:39:01 -08:00
Nikita Popov e9c0720010 [PHITransAddr] Check GEP source element type
It's not the same GEP if the source element type is different.
2022-02-11 16:22:48 +01:00
Nikita Popov 5d63903465 [SCCP] Check that load/store and global type match
SCCP requires that the load/store type and global type are the
same (it does not support bitcasts of tracked globals). With
typed pointers this was implicitly enforced.
2022-02-11 11:01:18 +01:00
Philip Reames 5ba115031d [PSE] Remove assumption that top level predicate is union from public interface [NFC*]
Note that this doesn't actually cause the top level predicate to become a non-union just yet.

The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
2022-02-10 16:14:52 -08:00
Philip Reames 01b56b8bdd [SCEVPredicateRewriter] Remove assumption top level predicate is a union [NFC] 2022-02-10 15:51:15 -08:00
Roman Lebedev c94ec7997a
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: use sub instead of add
For booleans, xor/add/sub are interchangeable:
https://alive2.llvm.org/ce/z/ziav3d

But for larger bitwidths, we'll need sub, so change it now.
2022-02-11 01:21:45 +03:00
Philip Reames e43b1ce4d5 [SCEV] Constify some uses of SCEVUnionPredicate* [NFC]
This exploits the immutability introduced in d334fec.
2022-02-10 12:42:19 -08:00
Nikita Popov 87a0b1bd23 [InstSimplify] Remove zero-index opaque pointer GEP
With opaque pointers, a zero-index GEP is a no-op. It does not
need to be retained for the pointer element type change it may
perform.
2022-02-10 16:01:56 +01:00
Roman Lebedev 580d3a14b2
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 y` handling
While that effectively concludes i1 select handling,
that boolean restriction can be lifted later.
2022-02-10 17:42:56 +03:00
Roman Lebedev 9a322e430f
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 C : i1 y` pattern
https://alive2.llvm.org/ce/z/uRvVtN
2022-02-10 17:42:56 +03:00
Roman Lebedev 576a45f20d
[NFC][SCEV] `createNodeForSelectOrPHIViaUMinSeq()`: refactor `i1 cond ? i1 x : i1 C` pattern
https://alive2.llvm.org/ce/z/2Q7Du_
2022-02-10 17:42:55 +03:00
Roman Lebedev 9766a0cca0
[SCEV] Recognize `cond ? i1 0 : i1 y` as `umin_seq ~cond, x`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/N6XwV-
2022-02-10 17:42:55 +03:00
Roman Lebedev 418604fd90
[SCEV] Recognize `cond ? i1 x : i1 1` as `~umin_seq cond, ~x`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/aqe9GK
2022-02-10 17:42:55 +03:00
Roman Lebedev 49d9acc242
[SCEV] Recognize logical `or` as `not umin_seq (not, not)`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/MUfbTL
2022-02-10 17:42:55 +03:00
Roman Lebedev 16bc24e7be
[SCEV] Recognize logical `and` as `umin_seq`
By definition, `umin_seq` has the exact same
poison stopping properties the original `select` had:
https://alive2.llvm.org/ce/z/59KuZZ
2022-02-10 17:42:55 +03:00
Roman Lebedev 1c69444863
[SCEV] `createNodeForSelectOrPHI()`: try constant-folding even if not an Instruction
We'd catch the tautological select pattern later anyways
due to constant folding, so that leaves PHI-like select,
but it does not appear to fire there.
2022-02-10 17:42:55 +03:00
Roman Lebedev 97930f85af
[NFC][SCEV] Prepare `createNodeForSelectOrPHI()` for gaining additional strategy
Currently `createNodeForSelectOrPHI()` takes an Instruction,
and only works on the Cond that is an ICmpInst,
but that can be relaxed somewhat.

For now, simply rename the existing function,
and add a thin wrapper ontop that still does
the same thing as it used to.
2022-02-10 17:42:55 +03:00
Roman Lebedev 73990ff8a7
[SCEV] Recognize binary `xor` as bit-wise `add`
https://alive2.llvm.org/ce/z/ULuZxB

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `add`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
2022-02-10 17:42:55 +03:00
Roman Lebedev 503541fa93
[SCEV] Recognize binary `and` as bit-wise `umin`
https://alive2.llvm.org/ce/z/aKAr94

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umin`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
2022-02-10 17:42:54 +03:00
Roman Lebedev e7e0834f07
[SCEV] Recognize binary `or` as bit-wise `umax`
https://alive2.llvm.org/ce/z/SMEaoc

We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umax`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
2022-02-10 17:42:54 +03:00
David Sherwood 1badfbb4fc Fix incorrect TypeSize->uint64_t cast in InductionDescriptor::isInductionPHI
The code was relying upon the implicit conversion of TypeSize to
uint64_t and assuming the type in question was always fixed. However,
I discovered an issue when running the canon-freeze pass with some
IR loops that contains scalable vector types. I've changed the code
to bail out if the size is unknown at compile time, since we cannot
compute whether the step is a multiple of the type size or not.

I added a test here:

  Transforms/CanonicalizeFreezeInLoops/phis.ll

Differential Revision: https://reviews.llvm.org/D118696
2022-02-10 09:39:12 +00:00
Philip Reames d334fec140 [SCEV] Make SCEVUnionPredicate externally immutable [NFC]
This is the last major stepping stone before being able to allocate the node via the folding set allocator.  That will in turn allow more general SCEV predicate expression trees.
2022-02-09 13:47:28 -08:00
Philip Reames e6d9bab558 [SCEV] Remove a direct call to SCEVUnionPredicate::add [NFC] 2022-02-09 13:04:12 -08:00
Philip Reames d39f4ac494 [SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC]
For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state.  It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.
2022-02-09 12:55:40 -08:00
Philip Reames aa845d7a24 [SCEV] Remove conversion to SCEVUnionPredicate in ExitNotTakenInfo [NFC]
This removes one of the places where we mutate an existing union predicate.
2022-02-09 12:10:23 -08:00
Philip Reames 83f895d952 [SCEV] Add interface for constructing generic SCEVComparePredicate [NFC} 2022-02-09 10:29:04 -08:00
Arthur Eubanks ff31020ee6 [OpaquePtr][LoopAccessAnalysis] Support opaque pointers
Previously we relied on the pointee type to determine what type we need
to do runtime pointer access checks.

With opaque pointers, we can access a pointer with more than one type,
so now we keep track of all the types we're accessing a pointer's
memory with.

Also some other minor getPointerElementType() removals.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119047
2022-02-09 09:11:27 -08:00
Philip Reames c302f1e677 [SCEV] Generalize SCEVEqualsPredicate to any compare [NFC]
PredicatedScalarEvolution has a predicate type for representing A == B.  This change generalizes it into something which can represent a A <pred> B.

This generality is currently unused, but is motivated by a couple of recent cases which have come up.  In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.
2022-02-08 08:18:09 -08:00
Roman Lebedev ae9414d562
[ValueTracking] Only check for non-undef/poison if already known to be a self-multiply
https://godbolt.org/z/js9fTTG9h
^ we don't care what `isGuaranteedNotToBeUndefOrPoison()` says
unless we already knew that the operands were equal.
2022-02-08 18:35:29 +03:00
Simon Pilgrim 1468202748 [ValueTracking] Add support for X*X self-multiplication
D108992 added KnownBits handling for 'Quadratic Reciprocity' self-multiplication patterns (bit[1] == 0), which can be used for non-undef values (poison is OK).

This patch adds noundef selfmultiply handling to value tracking so demanded bits patterns can make use of it.

Differential Revision: https://reviews.llvm.org/D117995
2022-02-08 13:33:27 +00:00
Simon Pilgrim e2537f6b19 [ValueTracking] Replace dyn_cast with dyn_cast_or_null to account for getTerminator returning null
Noticed while running checks on D117995 - a hexagon regression test was managing to return a block without a terminator
2022-02-08 13:33:26 +00:00
Johannes Doerfert 29c8ebad10 [MemoryBuiltins][FIX] Adjust index type size properly wrt. AS casts
Use existing functionality to strip constant offsets that works well
with AS casts and avoids the code duplication.

Since we strip AS casts during the computation of the offset we also
need to adjust the APInt properly to avoid mismatches in the bit width.
This code ensures the caller of `compute` sees APInts that match the
index type size of the value passed to `compute`, not the value result
of the strip pointer cast.

Fixes #53559.

Differential Revision: https://reviews.llvm.org/D118727
2022-02-07 20:19:19 -06:00
Kazu Hirata 3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Bill Wendling c6f0940d99 [NFC] Remove unnecessary #includes
An attempt to reduce the number of files that are recompiled due to a change.

Differential Revision: https://reviews.llvm.org/D119055
2022-02-04 21:22:41 -08:00
serge-sans-paille ffe8720aa0 Reduce dependencies on llvm/BinaryFormat/Dwarf.h
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.

More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.

This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].

As a consequence of that patch, downstream user may need to manually some extra
files:

llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h

In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.

$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after:   10978519
before:  11245451

Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions

Differential Revision: https://reviews.llvm.org/D118781
2022-02-04 11:44:03 +01:00
Augie Fackler b2d091aa5d [NFC] MemoryBuiltins: tease out a getFreeFunctionDataForFunction helper 2022-02-03 08:36:36 -08:00
Augie Fackler bad0301cc5 MemoryBuiltins: simplify isLibFreeFunction [NFC]
This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes.

Differential Revision: https://reviews.llvm.org/D117350
2022-02-03 08:30:02 -08:00
Serge Pavlov d2f132f0b7 [ConstantFolding] Fold constrained compare intrinsics
The change implements constant folding of ‘llvm.experimental.constrained.fcmp’
and ‘llvm.experimental.constrained.fcmps’ intrinsics.

Differential Revision: https://reviews.llvm.org/D110322
2022-02-03 16:45:56 +07:00
Andrew Litteken 30420bc344 [IRSim] Make sure that commutative intrinsics are treated as function calls without commutativity
Created to fix: https://github.com/llvm/llvm-project/issues/53537

Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case.

Reviewer: paquette

Closes Issue #53537

Differential Revision: https://reviews.llvm.org/D118807
2022-02-02 13:24:56 -06:00
Malhar Jajoo 778b455dd6 [LAA] Add Memory dependence remarks.
Adds new optimization remarks when vectorization fails.

More specifically, new remarks are added for following 4 cases:

- Backward dependency
- Backward dependency that prevents Store-to-load forwarding
- Forward dependency that prevents Store-to-load forwarding
- Unknown dependency

It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.

A regression test has been added to test the following cases:

a) Loop can be vectorized: No optimization remark is emitted
b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D108371
2022-02-02 12:07:51 +00:00
William S. Moses 8cb9c73609 [LoopIdiom] Keep TBAA when creating memcpy/memmove
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.

Reviewed By: jeroen.dobbelaere

Differential Revision: https://reviews.llvm.org/D108221
2022-01-31 16:28:13 -05:00
Eli Friedman b2837bf2f2 [ScalarEvolution] Add bailout to avoid zext of pointer.
The RHS of an isImpliedCond call can be a pointer even if the LHS is
not. This is similar to bfa2a81e.

Not going to include a testcase; an IR testcase would be extremely
complicated and fragile.

Fixes https://github.com/llvm/llvm-project/issues/51936 .

Differential Revision: https://reviews.llvm.org/D114555
2022-01-31 11:41:39 -08:00
Kazu Hirata cda7b6aaf3 [Analysis] Drop an unnecessary const from a return type (NFC)
Identified with readability-const-return-type.
2022-01-30 16:04:58 -08:00
Kazu Hirata 49fdee13c1 [Analysis] Use != to compare strings (NFC)
Identified with readability-string-compare.
2022-01-30 12:32:57 -08:00
Nuno Lopes 0dc20e321c [InstSimplify] fold 'xor X, poison' and 'div/rem X, poison' to poison 2022-01-30 10:46:54 +00:00
William S. Moses 99d2582164 [ScalarEvolution] Handle <= and >= in non infinite loops
Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1.

In the case of lhs <= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop.

In the case of lhs >= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D118090
2022-01-28 17:41:08 -05:00
Andrew Litteken 3785c1d055 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-28 13:52:21 -06:00
William S. Moses 0d04c77856 [ScalarEvolution] Mark a loop as finite if in a willreturn function
A limited version of (https://reviews.llvm.org/D118090) that only marks a loop as finite if in a willreturn function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D118429
2022-01-28 14:17:05 -05:00
Nikita Popov 8a4293f3ef [Loads] Require Align in isDereferenceableAndAlignedPointer() (NFC)
Now that loads always have an alignment, we should not perform an
ABI alignment fallback here.
2022-01-28 16:23:32 +01:00
Evgeniy Brevnov d7424939a6 [BasicAA] Add support for memmove intrinsic
Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic.

Reviewed By: reames, nikic

Differential Revision: https://reviews.llvm.org/D117095
2022-01-28 18:19:36 +07:00
Florian Hahn 1ca02bddb4
[ConstraintSystem] Mark function as const (NFC). 2022-01-27 13:44:47 +00:00
Congzhe Cao f3e1f44340 [IVDescriptor] Get the exact FP instruction that does not allow reordering
This is a bugfix in IVDescriptor.cpp.

The helper function `RecurrenceDescriptor::getExactFPMathInst()`
is supposed to return the 1st FP instruction that does not allow
reordering. However, when constructing the RecurrenceDescriptor,
we trace the use-def chain staring from a PHI node and for each
instruction in the use-def chain, its descriptor overrides the
previous one. Therefore in the final RecurrenceDescriptor we
constructed, we lose previous FP instructions that does not allow
reordering.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D118073
2022-01-27 00:33:46 -05:00
Nikita Popov 44cfc3a816 [LICM] Generalize unwinding check during scalar promotion
This extract a common isNotVisibleOnUnwind() helper into
AliasAnalysis, which handles allocas, byval arguments and noalias
calls. After D116998 this could also handle sret arguments. We
have similar logic in DSE and MemCpyOpt, which will be switched
to use this helper as well.

The noalias call case is a bit different from the others, because
it also requires that the object is not captured. The caller is
responsible for doing the appropriate check.

Differential Revision: https://reviews.llvm.org/D117000
2022-01-26 11:15:03 +01:00
Andrew Litteken ba79295c48 [NFC][IROutliner] fix namespace and unused variable 2022-01-25 18:41:30 -06:00
Andrew Litteken e8f4e41b6b [IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.

Recommit of 515eec3553

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106997
2022-01-25 18:25:50 -06:00
Andrew Litteken e50b217b4e Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region."
This reverts commit 515eec3553.

By mistake, commit message was not complete.
2022-01-25 18:24:19 -06:00
Andrew Litteken 515eec3553 [IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. 2022-01-25 18:20:10 -06:00
Andrew Litteken 9c2daf648c Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions"
This reverts commit 8de76bd569.

Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.
2022-01-25 18:19:33 -06:00
Andrew Litteken 8de76bd569 [IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.

This also adds an additional command line flag debug option to disable outlining intrinsics.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109450
2022-01-25 17:06:09 -06:00
Andrew Litteken f5f377d1fc [IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type
The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions.

There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match.

Reviewers: paquette, jroelofs

Differential Revision: https://reviews.llvm.org/D109448
2022-01-25 15:19:28 -06:00
Nikita Popov 3e2ae92d3f [SCEV] Remove an unnecessary GEP type check
The code already checked that the addrec step size and type alloc
size are the same. The actual pointer element type is irrelevant
here.
2022-01-25 12:56:46 +01:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Max Kazantsev c913dccfde [SCEV] Use lshr in implications
This patch adds support for implication inference logic for the
following pattern:
```
  lhs < (y >> z) <= y, y <= rhs --> lhs < rhs
```
We should be able to use the fact that value shifted to right is
not greater than the original value (provided it is non-negative).

Differential Revision: https://reviews.llvm.org/D116150
Reviewed-By: apilipenko
2022-01-25 13:25:19 +07:00
Ahmed Bougacha e7298464c5 [ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC.
This matches the actual runtime function more closely.
I considered also renaming both RetainRV/UnsafeClaimRV to end with
"ARV", for AutoreleasedReturnValue, but there's less potential
for confusion there.
2022-01-24 19:37:01 -08:00
Evgeniy Brevnov 0e55d4fab0 [AA] Refine ModRefInfo for llvm.memcpy.* in presence of operand bundles
Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D118033
2022-01-25 10:15:23 +07:00
Mircea Trofin b1af01fe6a [NFC][MLGO] Simplify conditional compilation
Most of the code that's shared between 'release' and 'development'
modes doesn't depend on anything special.
2022-01-24 11:19:04 -08:00
Kazu Hirata b752eb887f [Analysis] Use default member initialization (NFC)
Identified with modernize-use-default-member-init.
2022-01-23 20:32:56 -08:00
Kazu Hirata 448d0dfab7 [Analysis] Remove a redundant const from a return type (NFC)
Identified with readability-const-return-type.
2022-01-23 14:00:03 -08:00
Sanjay Patel 2e26633af0 [IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef
The behavior in Analysis (knownbits) implements poison semantics already,
and we expect the transforms (for example, in instcombine) derived from
those semantics, so this patch changes the LangRef and remaining code to
be consistent. This is one more step in removing "undef" from LLVM.

Without this, I think https://github.com/llvm/llvm-project/issues/53330
has a legitimate complaint because that report wants to allow subsequent
code to mask off bits, and that is allowed with undef values. The clang
builtins are not actually documented anywhere AFAICT, but we might want
to add that to remove more uncertainty.

Differential Revision: https://reviews.llvm.org/D117912
2022-01-23 11:22:48 -05:00
Nikita Popov b4900296e4 [ConstantFold] Allow all float types in reinterpret load folding
Rather than hardcoding just half, float and double, allow all
floating point types.
2022-01-21 09:26:51 +01:00
Nikita Popov 6a19cb837c [ConstantFold] Support pointers in reinterpret load folding
Peculiarly, the necessary code to handle pointers (including the
check for non-integral address spaces) is already in place,
because we were already allowing vectors of pointers here, just
not plain pointers.
2022-01-21 09:13:37 +01:00
Nikita Popov 05cd9a0596 [ConstantFold] Simplify type check in reinterpret load folding (NFC)
Keep a list of allowed types, but then always construct the map
type the same way. We need an integer with the same width as the
original type.
2022-01-21 09:06:35 +01:00
Mircea Trofin f29256a64a [MLGO] Improved support for AOT cross-targeting scenarios
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.

To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.

This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.

The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.

Differential Revision: https://reviews.llvm.org/D117752
2022-01-20 07:05:39 -08:00
Mircea Trofin e67430cca4 [MLGO] ML Regalloc Eviction Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.

This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.

Differential Revision: https://reviews.llvm.org/D117147
2022-01-19 11:00:32 -08:00
Nikita Popov d8bff13a8a [NFC] Add missing <map> includes
These were relying on a transitive include.
2022-01-19 12:29:03 +01:00
Philip Reames 215bd46905 [MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed
Try 2, this time including the test.
2022-01-18 15:24:52 -08:00
Philip Reames fcab2d1309 Revert "[MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed"
This reverts commit 167af7bbfe. Buildbot breaks since I forgot to remove a unit test.
2022-01-18 15:16:12 -08:00
Philip Reames 167af7bbfe [MemoryBuiltins] Demote isMallocLikeFn to implementation routine since last use has been removed 2022-01-18 15:12:07 -08:00
Mircea Trofin 3e8553aab4 [mlgo][inline] Improve global state tracking
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.

Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.

This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D115847
2022-01-18 17:45:34 +00:00
Jan Svoboda 5f4ae56457 [llvm] Remove uses of `std::vector<bool>`
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.

This patch does just that for llvm.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D117121
2022-01-18 18:20:45 +01:00
Nikita Popov 3ec7f46e99 [LVI] Handle implication from icmp of trunc (PR51867)
Similar to the existing urem code, if we have (trunc X) >= C,
then also X >= C.

Proof: https://alive2.llvm.org/ce/z/RF4YR2

Fixes https://github.com/llvm/llvm-project/issues/51867.
2022-01-18 11:24:11 +01:00
Nikita Popov 9e68557e64 [LVI] Handle commuted SPF min/max operands
We need to check that the operands of the min/max are the operands
of the select, but we don't care which order they are in.
2022-01-18 10:43:00 +01:00
Nikita Popov d15823e300 [LVI] Compute SPF range even if one operands is overdefined
If we have a constant range for one operand but not the other,
we can generally still compute a useful results for SPF min/max.
2022-01-18 10:40:49 +01:00
Nikita Popov 202d590a01 [LVI] Consistently intersect assumes
Integrate intersection with assumes into getBlockValue(), to ensure
that it is consistently performed.

We were doing it in nearly all places, but for example missed it
for select inputs.
2022-01-18 10:15:31 +01:00
Nikita Popov f104cc38f4 [ConstantFold] Don't fold load from non-byte-sized vector
Following up on 1470f94d71 (r63981173):

The result here (probably) depends on endianness. Don't bother
trying to handle this exotic case, just bail out.
2022-01-17 17:01:47 +01:00
Nikita Popov af12a3f4a9 [ValueTracking] Remove ComputeMultiple() function
This function is no longer used since
499f1ca79f.
2022-01-17 10:28:31 +01:00
Bryce Wilson dd13744bfb
Revert "[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn"
This reverts commit 1f2cfc4fdc.
2022-01-14 14:42:53 -08:00
Roman Lebedev 650fc40b6d
[NFC][SCEV] Introduce `getCastExpr()` QoL helper 2022-01-15 00:52:22 +03:00
Bryce Wilson 1f2cfc4fdc
[BasicAliasAnalysis] Remove isMallocOrCallocLikeFn
Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed.

Differential Revision: https://reviews.llvm.org/D117180
2022-01-14 12:22:01 -08:00
Philip Reames dac82b53e2 Revert "[MemoryBuiltins] [NFC] Add missing section comments"
This reverts commit 83338d5032.  Comments in source are non-idiomatic and naming choice in head is unclear.
2022-01-14 08:34:21 -08:00
Roman Lebedev b32077234b
[NFCI][SCEV] `computeExitLimitFromCondFromBinOp()`: rely on `getSequentialMinMaxExpr()` constant relaxation
`getSequentialMinMaxExpr()` has been taught to perform this relaxation,
so rely on that now. Not sure this can be tested.
2022-01-14 17:07:48 +03:00
Roman Lebedev 8dcba20674
[SCEV] `getSequentialMinMaxExpr()`: relax 2-op umin_seq w/ constant to umin
Currently, `computeExitLimitFromCondFromBinOp()` does that directly.
2022-01-14 17:07:48 +03:00
Roman Lebedev c86a982d7d
[SCEV] `getSequentialMinMaxExpr()`: rewrite deduplication to be fully recursive
Since we don't merge/expand non-sequential umin exprs into umin_seq exprs,
we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq
can have duplicate operands still.
2022-01-14 15:42:26 +03:00
Florian Hahn 1ef9bfa013
[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst.
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D117038
2022-01-14 09:59:52 +00:00
Nikita Popov 20d9c51dc0 [ConstantFold] Check for uniform value before reinterpret load
The reinterpret load code will convert undef values into zero.
Check the uniform value case before it to produce a better result
for all-undef initializers.

However, the uniform value handling will return the uniform value
even if the access is out of bounds, while the reinterpret load
code will return undef. Add an explicit check to retain the
previous result in this case.
2022-01-14 10:18:02 +01:00
Bryce Wilson 83338d5032
[MemoryBuiltins] [NFC] Add missing section comments 2022-01-13 17:43:43 -08:00
Philip Reames ee02cf0797 [MemoryBuiltins] Demote isCallocLikeFn and isAlignedAllocLikeFn to local helpers after removal of last external use [NFC] 2022-01-13 15:51:17 -08:00
Philip Reames cf66f01ec1 [Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish]
The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code.

Note this is not NFC for two reasons:
* The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space.
* I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that.

Differential Revision: https://reviews.llvm.org/D117241
2022-01-13 15:33:24 -08:00
Bryce Wilson 68874d8b5f
[MemoryBuiltins] [NFC] Remove unused overload of isAlignedAllocLikeFn
Differential Revision: https://reviews.llvm.org/D117245
2022-01-13 15:19:04 -08:00
Arthur Eubanks 757e044dce [Inliner] Don't removeDeadConstantUsers() when checking if a function is dead
If a function has many uses, this can take a good chunk of compile times.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D117236
2022-01-13 14:29:45 -08:00
Philip Reames cd36b29ec7 [MemoryBuiltins] (Slightly) clean up abuse of MallocLike bitmask [NFC] 2022-01-13 12:39:22 -08:00
Nikita Popov aba7c3c033 [ConstantFold] Check uniform value in ConstantFoldLoadFromConst()
This case is automatically handled if ConstantFoldLoadFromConstPtr()
is used. Make sure that ConstantFoldLoadFromConst() also handles it.
2022-01-13 14:40:19 +01:00
Hans Wennborg 2bc57d85eb Don't override __attribute__((no_stack_protector)) by inlining (PR52886)
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).

  extern void h(int* p);

  inline __attribute__((always_inline)) int g() {
    return 0;
  }

  int __attribute__((__no_stack_protector__)) f() {
    int a[1];
    h(a);
    return g();
  }

LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.

More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.

One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.

However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.

This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.

Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):

  extern void h(int* p);

  inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
    return 0;
  }

  int f() {
    int a[1];
    h(a);
    return g();
  }

I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.

See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722

Differential revision: https://reviews.llvm.org/D116589
2022-01-13 12:04:49 +01:00
Sanjay Patel 6bd127b079 [InstSimplify] use knownbits to fold more udiv/urem
We could use knownbits on both operands for even more folds (and there are
already tests in place for that), but this is enough to recover the example
from:
https://github.com/llvm/llvm-project/issues/51934
(the tests are derived from the code in that example)

I am assuming no noticeable compile-time impact from this because udiv/urem
are rare opcodes.

Differential Revision: https://reviews.llvm.org/D116616
2022-01-12 14:59:43 -05:00
Rosie Sumpter 552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
Mircea Trofin 248d55af3e [NFC][MLGO] Use LazyCallGraph::Node to track functions.
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)

Differential Revision: https://reviews.llvm.org/D116964
2022-01-11 19:23:47 -08:00
Mircea Trofin 1f5dceb1d0 [MLGO] Add support for multiple training traces per module
This happens in e.g. regalloc, where we trace decisions per function,
but wouldn't want to spew N log files (i.e. one per function). So we
output a key-value association, where the key is an ID for the
sub-module object, and the value is the tensorflow::SequenceExample.

The current relation with protobuf is tenuous, so we're avoiding a
custom message type in favor of using the `Struct` message, but that
requires the values be wire-able strings, hence base64 encoding.

We plan on resolving the protobuf situation shortly, and improve the
encoding of such logs, but this is sufficient for now for setting up
regalloc training.

Differential Revision: https://reviews.llvm.org/D116985
2022-01-11 16:13:31 -08:00
Mircea Trofin a81b0c978f [NFC][MLGO] Remove the word "inliner" in a generic error message. 2022-01-11 12:39:16 -08:00
Arthur Eubanks bf52210e25 [NFC][LazyCallGraph] Remove check in removeDeadFunction() if graph is empty
If we're in removeDeadFunction(), we should have already constructed the call graph.

Differential Revision: https://reviews.llvm.org/D115676
2022-01-11 10:17:13 -08:00
Florian Hahn f0ef1ea6dd
[IRBuilder] Introduce folder using inst-simplify, use for Or fold.
Alternative to D116817.

This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.

This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D116935
2022-01-11 17:30:48 +00:00
Philip Reames 8f553da492 [instsimplify] Add a comment and test for a highly confusing case 2022-01-11 09:24:10 -08:00
Philip Reames e838949bee [GlobalsModRef] Apply indirect-global rule to all globals initialized from noalias calls
Extend the existing malloc-family specific optimization to all noalias calls.  This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage.

Differential Revision: https://reviews.llvm.org/D116980
2022-01-11 08:44:31 -08:00
Florian Hahn 8a469e2050
[InstSimplify] Fold inbounds GEP to poison if base is undef.
D92270 updated constant expression folding to fold inbounds GEP to
poison if the base is undef. Apply the same logic to SimplifyGEPInst.

The justification is that we can choose an out-of-bounds pointer as base
pointer.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D117015
2022-01-11 16:11:22 +00:00
Roman Lebedev 5ceb070bbb
[SCEV] `getSequentialMinMaxExpr()`: look into `umin` when deduplicating operands
We could just merge all umin into umin_seq, but that is likely
a pessimization, so don't do that, but pretend that we did
for the purpose of deduplication.
2022-01-11 18:51:57 +03:00
Roman Lebedev 5e16650792
[SCEV] `getSequentialMinMaxExpr()`: keep only the first instance of an operand
Having the same operand more than once doesn't change the outcome here,
neither reduction-wise nor poison-wise.
We must keep the first instance specifically though.
2022-01-11 16:51:53 +03:00
Roman Lebedev 76a0abbc13
[SCEV] Reenable umin_seq support and fix the `computeSCEVAtScope()`
This reverts commit f62f47f5e1.
2022-01-11 16:03:35 +03:00
Nikita Popov 3946095b88 [MemoryBuiltins] Remove unused isOpNewLikeFn() (NFC)
This function is no longer used since
2cafbcb560.
2022-01-11 12:27:23 +01:00
Nikita Popov b56f6f1913 [MemoryBuiltins] Remove unused isStrdupLikeFn() function (NFC)
This function is no longer used after
dcbc91f40c.
2022-01-11 12:26:20 +01:00
Philip Reames f62f47f5e1 Partial revert of 82fb4f4
Two crashes have been reported.  This change disables the new logic while leaving the new node in tree.  Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.
2022-01-10 18:18:34 -08:00
Philip Reames 5265ac72c6 [MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC]
Not all allocation functions are removable if unused.  An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++.  An example of a removable one - at least according to historical practice - would be malloc.
2022-01-10 15:43:39 -08:00
Roman Lebedev 82fb4f4b22
[SCEV] Sequential/in-order `UMin` expression
As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692,
SCEV is forbidden from reasoning about 'backedge taken count'
if the branch condition is a poison-safe logical operation,
which is conservatively correct, but is severely limiting.

Instead, we should have a way to express those
poison blocking properties in SCEV expressions.

The proposed semantics is:
```
Sequential/in-order min/max SCEV expressions are non-commutative variants
of commutative min/max SCEV expressions. If none of their operands
are poison, then they are functionally equivalent, otherwise,
if the operand that represents the saturation point* of given expression,
comes before the first poison operand, then the whole expression is not poison,
but is said saturation point.
```
* saturation point - the maximal/minimal possible integer value for the given type

The lowering is straight-forward:
```
compare each operand to the saturation point,
perform sequential in-order logical-or (poison-safe!) ordered reduction
over those checks, and if reduction returned true then return
saturation point else return the naive min/max reduction over the operands
```
https://alive2.llvm.org/ce/z/Q7jxvH (2 ops)
https://alive2.llvm.org/ce/z/QCRrhk (3 ops)
Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS
Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97

That allows us to handle the patterns in question.

Reviewed By: nikic, reames

Differential Revision: https://reviews.llvm.org/D116766
2022-01-10 20:51:26 +03:00
Philip Reames 1d127315e7 Minor style tweaks following fb93659 2022-01-10 09:32:29 -08:00
Bryce Wilson 7febd60a90 [instcombine] Add align return attributes for operator new(..., align_val)
(Split from original patch to separate non-NFC part and add coverage.  I typoed when adding the new test, so this change includes the typo fix to let libfunc recongize the signature.  Didn't figure it was worth another separate commit.)

Differential Revision: https://reviews.llvm.org/D116851 (part 2 of 2)
2022-01-10 09:15:20 -08:00
Bryce Wilson fb936595fa [MemoryBuiltins] Add field for alignment argument [NFC]
There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically.

This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack.  The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC.  The later will be reviewed separately.

Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)
2022-01-10 09:15:20 -08:00
Simon Pilgrim fd1094f318 [ConstantFolding] Clean up Intrinsics::abs undef handling
Match cttz/ctlz handling by assuming C1 == 0 if C1 != 1 - I've added an assertion as well.

Fixes static analyzer nullptr dereference warnings.
2022-01-10 17:04:03 +00:00
Nikita Popov 92d55e7336 [MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall()
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.

We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.

Differential Revision: https://reviews.llvm.org/D116800
2022-01-10 09:18:15 +01:00
Simon Pilgrim be7dbd674c [DivergenceAnalysis] Simplify inRegion test based on whether the RegionLoop pointer is null or not
More closely matches the documentation

Requested by @nikic
2022-01-08 14:30:10 +00:00
Simon Pilgrim b3f193a980 [DivergenceAnalysis] Fix static analyzer warning about dereference of nullptr
We're testing that the RegionLoop pointer is null in the first part of the check, so we need to check that its non-null before dereferencing it in a later part of the check.
2022-01-08 13:57:33 +00:00
Kazu Hirata b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Philip Reames f38873537b [MemoryBuiltin] Cleanup stale todo comments [NFC]
strdup/strndup are already partially implemented, move remaining comment to relevant place.  Remaining named routines are copy routines and mostly handled via intrinsics already - they do not allocate new memory.
2022-01-07 13:57:20 -08:00
Roman Lebedev 32300375f5
[NFCI] `ScalarEvolution::getRangeRef()`: collapse `SCEVMinMaxExpr` handling 2022-01-08 00:23:08 +03:00
Arthur Eubanks d51e3474e0 [LazyCallGraph] Ignore empty RefSCCs rather than shift RefSCCs when removing dead functions
This is in preparation for D115545 which attempts to delete discardable functions if they are unused. With that change, shifting RefSCCs becomes noticeable in compile time. This change makes the LCG update negligible again.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D116776
2022-01-07 09:42:23 -08:00
Philip Reames 6b0ff0969d Extract utility function for checking initial value of allocation [NFC, try 2]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.

The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.
2022-01-07 08:44:08 -08:00
Roman Lebedev a5a6960d1c
[NFCI][IR] MinMaxIntrinsic: add some more helper methods, and use them 2022-01-07 13:02:11 +03:00
Philip Reames c6a0c1585a Revert "Extract utility function for checking initial value of allocation [NFC]"
This reverts commit 9ce30fe86f.  Appears to be causing a problem on a buildbot, revert while investigating.

https://green.lab.llvm.org/green//job/clang-stage1-RA/26818/consoleFull#-1502953973d489585b-5106-414a-ac11-3ff90657619c
2022-01-06 19:05:51 -08:00
Philip Reames 9ce30fe86f Extract utility function for checking initial value of allocation [NFC]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.
2022-01-06 18:02:14 -08:00
Philip Reames 5d1cfd4348 Remove unused LookThroughBitCast param in isXAllocLike functions [NFC]
This parameter took the non-default value exactly twice, and neither had semantic effect.
2022-01-06 18:02:13 -08:00
Philip Reames 7052670e96 Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC]
These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.
2022-01-06 18:02:13 -08:00
Philip Reames 67a3331e4f Inline extractMallocCall to sole use and delete [NFC] 2022-01-06 18:02:13 -08:00
Philip Reames 4b0fc924a9 Delete unused extractCallocCall routine [NFC] 2022-01-06 18:02:13 -08:00
Philip Reames cffd268316 Demote getMallocType to implementation routine in MemoryBuiltins [NFC] 2022-01-06 18:02:13 -08:00
Daniil Suchkov 524abc68f2 Introduce NewPM .dot printers for DomTree
This patch adds a couple of NewPM function passes (dot-dom and
dot-dom-only) that dump DomTree into .dot files.

Reviewed-By: aeubanks
Differential Revision: https://reviews.llvm.org/D116629
2022-01-05 23:25:40 +00:00
Nico Weber 085f078307 Revert "Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`.""
This reverts commit 859ebca744.
The change contained many unrelated changes and e.g. restored
unit test failes for the old lld port.
2022-01-05 13:10:25 -05:00
David Salinas 859ebca744 Revert D109159 "[amdgpu] Enable selection of `s_cselect_b64`."
This reverts commit 640beb38e7.

That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup

Change-Id: Ibf8e397df94001f248fba609f072088a46abae08

Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D115960

Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
2022-01-05 17:57:32 +00:00
Philip Reames c16fd6a376 Rename doesNotReadMemory to onlyWritesMemory globally [NFC]
The naming has come up as a source of confusion in several recent reviews.  onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
2022-01-05 08:52:55 -08:00
Nikita Popov 3dc1907d06 [ConstantFold] Use ConstantFoldLoadFromUniformValue() in more places
In particular, this also preserves undef when loading from padding,
rather than converting it to zero through a different codepath.

This is the remaining part of D115924.
2022-01-05 12:47:50 +01:00
Nikita Popov 99c6b12b92 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.

This is part of D115924.
2022-01-05 12:30:46 +01:00
Mircea Trofin a120fdd337 [NFC][MLGO]Add RTTI support for MLModelRunner and simplify runner setup 2022-01-04 19:46:14 -08:00
Sanjay Patel 1e50d06466 [Analysis] fix swapped operands to computeConstantRange
This was noted in post-commit review for D116322 / 0edf99950e .

I am not seeing how to expose the bug in a test though because
we don't pass an assumption cache into this analysis from there.
2022-01-04 13:13:50 -05:00
Philip Reames b061d86c69 [SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics
This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them.

Motivation comes from compiling a simple example with -ftrapv.

Differential Revision: https://reviews.llvm.org/D116499
2022-01-04 09:44:23 -08:00
Florian Hahn d8276208be
[LAA] Remove overeager assertion for aggregate types.
0a00d64 turned an early exit here into an assertion, but the assertion
can be triggered, as PR52920 shows.

The later code is agnostic to the accessed type, so just drop the
assert. The patch also adds tests for LAA directly and
loop-load-elimination to show the behavior is sane.
2022-01-04 15:20:35 +00:00
Nikita Popov 71b2c4a3cf [ConstantFolding] Remove unused ConstantFoldLoadThroughGEPConstantExpr()
This API is no longer used since bbeaf2aac6.
2022-01-04 12:37:12 +01:00
Rosie Sumpter 961f51fdf0 [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores
For loops that contain in-loop reductions but no loads or stores, large
VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes
has no element types to check through and so returns the default widths
(-1U for the smallest and 8 for the widest). This results in the widest
VF being chosen for the following example,

float s = 0;
for (int i = 0; i < N; ++i)
  s += (float) i*i;

which, for more computationally intensive loops, leads to large loop
sizes when the operations end up being scalarized.

In this patch, for the case where ElementTypesInLoop is empty, the widest
type is determined by finding the smallest type used by recurrences in
the loop instead of falling back to a default value of 8 bits. This
results in the cost model choosing a more sensible VF for loops like
the one above.

Differential Revision: https://reviews.llvm.org/D113973
2022-01-04 10:12:57 +00:00
Craig Topper cbcbbd6ac8 [ValueTracking][SelectionDAG] Rename ComputeMinSignedBits->ComputeMaxSignificantBits. NFC
This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.

Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.

Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.

Reviewed By: lebedev.ri, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D116522
2022-01-03 11:33:30 -08:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Nikita Popov 5afbfe33e7 [ConstantFold] Make icmp of gep fold offset based
We can fold an equality or unsigned icmp between base+offset1 and
base+offset2 with inbounds offsets by comparing the offsets directly.

This replaces a pair of specialized folds that tried to reason
based on the GEP structure instead. One of those folds was plain
wrong (because it does not account for negative offsets), while
the other is unnecessarily complicated and limited (e.g. it will
fail with bitcasts involved).

The disadvantage of this change is that it requires data layout,
so the fold is no longer performed by datalayout-independent
constant folding. I don't think this is a loss in practice, but
it does regress the ConstantExprFold.ll test, which checks folding
without running any passes.

Differential Revision: https://reviews.llvm.org/D116332
2022-01-03 09:41:37 +01:00
Philip Reames 890e685492 [SCEV] Drop unused param from new version of computeExitLimitFromICmp [NFC] 2022-01-02 10:15:17 -08:00
Philip Reames f19a95bbed [SCEV] Split computeExitLimitFromICmp into two versions [NFC]
This is in advance of a following change which needs to the non-icmp API.
2022-01-02 09:58:32 -08:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Sanjay Patel c054402170 [InstSimplify] fold or-nand-xor
~(A & B) | (A ^ B) --> ~(A & B)

https://alive2.llvm.org/ce/z/hXQucg
2021-12-31 15:11:13 -05:00
Nuno Lopes 64af9f61c3 [InstSimplify] add 'x + poison -> poison' (needed for NewGVN) 2021-12-30 11:52:42 +00:00
Fangrui Song b69fe48ccf [IROutliner] Move global namespace cl::opt inside llvm:: 2021-12-30 01:12:55 -08:00
Sanjay Patel 0edf99950e [Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.

This patch is modeled on a similar construct that was added with
D59386.

I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).

InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.

Fixes #52884

Differential Revision: https://reviews.llvm.org/D116322
2021-12-28 09:45:37 -05:00
Sanjay Patel 773ab3c665 [Analysis] remove unneeded casts; NFC
The callee does the casting too; this matches a plain call later in the same function for 'shl'.
2021-12-27 13:41:50 -05:00
Nikita Popov ae64c5a0fd [DSE][MemLoc] Handle intrinsics more generically
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.

This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.

There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.

Differential Revision: https://reviews.llvm.org/D116210
2021-12-24 09:29:57 +01:00
Mehrnoosh Heidarpour 0ff20f2f44 [InstSimplify] Fold logic AND to zero
Adding following fold opportunity:
((A | B) ^ A) & ((A | B) ^ B) --> 0

Reviewed By: spatel, rampitec

Differential Revision: https://reviews.llvm.org/D115755
2021-12-23 10:06:26 -05:00
Mircea Trofin edf8e3ea5e [NFC][mlgo]Make the test model generator inlining-specific
When looking at building the generator for regalloc, we realized we'd
need quite a bit of custom logic, and that perhaps it'd be easier to
just have each usecase (each kind of mlgo policy) have it's own
stand-alone test generator.

This patch just consolidates the old `config.py` and
`generate_mock_model.py` into one file, and does away with
subdirectories under Analysis/models.
2021-12-22 13:38:45 -08:00
Nikita Popov 8a0e35f3a7 [MemoryLocation] Don't require nocapture in getForDest()
As reames mentioned on related reviews, we don't need the nocapture
requirement here. First of all, from an API perspective, this is
not something that MemoryLocation::getForDest() should be checking
in the first place, because it does not affect which memory this
particular call can access; it's an orthogonal concern that should
be handled by the caller if necessary.

However, for both of the motivating users in DSE and InstCombine,
we don't need the nocapture requirement, because the capture can
either be purely local to the call (a pointer identity check that
is irrelevant to us), be part of the return value (which we check
is unused), or be written in the dest location, which we have
determined to be dead.

This allows us to remove the special handling for libcalls as well.

Differential Revision: https://reviews.llvm.org/D116148
2021-12-22 12:20:13 +01:00
Nikita Popov f5ac23b5ae [ArgPromotion][TTI] Pass types to ABI compatibility hook
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.

This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.

This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).

Differential Revision: https://reviews.llvm.org/D116031
2021-12-22 09:37:51 +01:00
Serge Pavlov 77b923d0db [ConstantFolding] Do not remove side effect from constrained functions
According to the discussion in https://reviews.llvm.org/D110322 the code
that removes side effect from replaced function call is deleted.

Differential Revision: https://reviews.llvm.org/D115870
2021-12-22 13:45:49 +07:00
Nikita Popov 2926d6d335 [ConstantFold][GlobalOpt] Don't create x86_mmx null value
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
2021-12-21 09:11:41 +01:00
Kazu Hirata 500c4b68dc [llvm] Construct SmallVector with iterator ranges (NFC) 2021-12-20 23:43:24 -08:00
Philip Reames 44d23d5345 [DSE] Remove calls with known writes to dead memory
This is a reapply of a8a51fe5, which was reverted in 1ba99e due to a failing compiler-rt test.   That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out.  I hopefully managed to stablize that test in 9b955f.  (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.)

Original commit message follows..

The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-20 18:10:23 -08:00
Sanjay Patel a56803b8f8 [Analysis] fix cast in ValueTracking to allow constant expression
The test would crash because a non-instruction negate op made it in here.

Fixes #51506
2021-12-20 17:16:47 -05:00
Sander de Smalen b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Nikita Popov aeb36ae0f4 Revert "[ConstantFolding] Unify handling of load from uniform value"
This reverts commit 9fd4f80e33.

This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
2021-12-18 20:46:52 +01:00
Ricky Zhou 9927a06f74 [AA] Handle callbr instructions in alias analysis
Before this change, AAResults::getModRefInfo() was missing a case for
callbr instructions (asm goto), which may read/write memory. In PR52735,
this led to a miscompile where a load was incorrect eliminated.

Add this missing case, as well as an assert verifying that all
memory-accessing instructions are handled properly.

Fixes #52735.

Differential Revision: https://reviews.llvm.org/D115992
2021-12-18 18:49:17 +01:00
Nikita Popov 1ba99eaf70 Revert "[DSE] Remove calls with known writes to dead memory"
This reverts commit a8a51fe556.

This breaks the strncpy-overflow.cpp test case.
2021-12-18 09:23:41 +01:00
Philip Reames a8a51fe556 [DSE] Remove calls with known writes to dead memory
The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.

Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).

In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.

Differential Revision: https://reviews.llvm.org/D115904
2021-12-17 13:42:36 -08:00
Philip Reames 793c0da89e [capturetracking] Explicitly check for callee operand [NFC]
Pull out an explicit check rather than relying on the fact that the callee operand is not a data operand.  The only real value is it gives us a clear place to move the comment, and makes the code slightly more understandable.
2021-12-17 09:21:35 -08:00
Nikita Popov 9fd4f80e33 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.

Differential Revision: https://reviews.llvm.org/D115924
2021-12-17 17:05:06 +01:00
Momchil Velikov 6192c312cf [AA] Correctly maintain the sign of PartiaAlias offset
Preserve the invariant that offset reported in the case of a
`PartialAlias` between `Loc1` and `Loc2`, is such that
`Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and
the second argument, respectively, in alias queries.

Differential Revision: https://reviews.llvm.org/D115927
2021-12-17 15:45:26 +00:00
Florian Hahn f5f421e0ee
[SCEV] Apply loop guards in reverse order.
This patch updates applyLoopGuards to first collect all conditions and
then applies them in reverse order. This ensures the SCEVs with the
shortest dependency chains are constructed first, limiting the required
stack size.

This fixes a crash reported in D113578.

Note that the order conditions are applied can impact the accuracy of
the result, mostly due to missing min/max simplifications when
constructing SCEVs.

The changed test highlights the impact of the evaluation order. I will
follow up with a SCEV patch to improve min/max simplifications to get
the same results for both orders.
2021-12-16 10:52:37 +00:00
Nikita Popov a8c2ba105d [Inline] Disable deferred inlining
After the switch to the new pass manager, we have observed multiple
instances of catastrophic inlining, where the inliner produces huge
functions with many hundreds of thousands of instructions from small
input IR. We were forced to back out the switch to the new pass
manager for this reason. This patch fixes at least one of the root
cause issues.

LLVM uses a bottom-up inliner, and the fact that functions are processed
bottom-up is not just a question of optimality -- it is an imporant
requirement to prevent runaway inlining. The premise of the current
inlining approach and cost model is that after all calls inside a function
have been inlined, it may get large enough that inlining it into its
callers is no longer considered profitable. This safeguard does not
exist if inlining doesn't happen bottom-up, as inlining the callees,
and their callees, and their callees etc. will always seem individually
profitable, and the inliner can easily flatten the whole call tree.

There are instances where we necessarily have to deviate from bottom-up
inlining: When inlining in an SCC there is no natural "bottom", so
inlining effectively happens top-down. This requires special care,
and the inliner avoids exponential blowup by ensuring that functions
in the SCC grow in a balanced way and will eventually hit the threshold.

However, there is one instance where the inlining advisor explicitly
violates the bottom-up principle: Deferred inlining tries to "defer"
inlining a call if it determines that inlining the caller into all
its call-sites would be more profitable. Something very important to
understand about deferred inlining is that it doesn't make one inlining
choice in place of another -- it effectively chooses to do both. If we
have a call chain A -> B -> C and cost modelling tells us that inlining
B -> C is profitable, but we defer this and instead inline A -> B first,
then we'll now have a call A -> C, and the cost model will (a few special
cases notwithstanding) still tell us that this is profitable. So the end
result is that we inlined *both* B and C, even though under the usual
cost model function B would have been too large to further inline after
C has been integrated into it.

Because deferred inlining violates the bottom-up invariant of the inliner,
it can result in exponential inlining. The exponential-deferred-inlining.ll
test case illustrates this on a simple example (see
https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a
much more catastrophic case with about 5000x size blowup). If the call
chain A -> B -> C is not a chain but a tree of calls, then we end up
deferring inlining across the tree and end up flattening everything into
the root node.

This patch proposes to address this by disabling deferred inlining
entirely (currently still behind an option). Beyond the issue of
exponential inlining, I don't think that the whole concept makes sense,
at least as long as deferred inlining still ends up inlining both call
edges.

I believe the motivation for having deferred inlining in the first place
is that you might have a small wrapper function with local linkage that
could be eliminated if inlined. This would automatically happen if there
was a single caller, due to the large "last call to local" bonus. However,
this bonus is not extended if there are multiple callers, even if we
would eventually end up inlining into all of them (if the bonus were
extended).

Now, unlike the normal inlining cost model, the deferred inlining cost
model does look at all callers, and will extend the "last call to local"
bonus if it determines that we could inline all of them as long as we
defer the current inlining decision. This makes very little sense.
The "last call to local" bonus doesn't really cost model anything.
It's basically an "infinite" bonus that ensures we always inline the
last call to a local. The fact that it's not literally infinite just
prevents inlining of huge functions, which can easily result in
scalability issues. I very much doubt that it was an intentional
cost-modelling choice to say that getting rid of a small local function
is worth adding 15000 instructions elsewhere, yet this is exactly how
this value is getting used here.

The main alternative I see to complete removal is to change deferred
inlining to an actual either/or decision. That is, to mark deferred
calls as noinline so we're actually trading off one inlining decision
against another, and not just adding a side-channel to the cost model
to do both.

Apart from fixing the catastrophic inlining case, the effect on rustc
is a modest compile-time improvement on average (up to 8% for a
parsing-type crate, where tree-like calls are expected) and pretty
neutral where run-time performance is concerned (mix of small wins
and losses, usually in the sub-1% category).

Differential Revision: https://reviews.llvm.org/D115497
2021-12-16 09:59:50 +01:00
Mircea Trofin db5aceb979 [NFC] Expose the ReleaseModeModelRunner
The type was pretty much generic, just needed a bit of parameterization.

Differential Revision: https://reviews.llvm.org/D115764
2021-12-15 23:21:58 -08:00
Fangrui Song cf9e61a9bb [LTO][WPD] Simplify mustBeUnreachableFunction and test after D115492
An well-formed IR function definition must have an entry basic block and
a well-formed IR basic block must have one terminator so the emptiness
check can be simplified.
Also simplify the test a bit.

Reviewed By: luna

Differential Revision: https://reviews.llvm.org/D115780
2021-12-15 15:43:35 -08:00
Arthur Eubanks 5a81a60391 [NFC] Remove more calls to getAlignment()
These are deprecated and should be replaced with getAlign().

Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
2021-12-15 14:40:57 -08:00
Mingming Liu 09a704c5ef [LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.
Differential Revision: https://reviews.llvm.org/D115492
2021-12-14 20:18:04 +00:00
Philip Reames 423f19680a Add FMF to hasPoisonGeneratingFlags/dropPoisonGeneratingFlags
These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds.

Differential Revision: https://reviews.llvm.org/D115460
2021-12-14 08:43:00 -08:00
Florian Hahn ddfac0759c
Revert "[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest."
This reverts commit ac60263ad1.

It looks like the test fails on certain non-Darwin system, even though
the triple is explicitly set to macos. Revert while I investigate.
2021-12-14 14:48:47 +00:00
Nikita Popov 7abf299fed [InlineAdvisor] Add option to control deferred inlining (NFC)
This change is split out from D115497 to add the option
independently from the switch of the default value.
2021-12-14 15:46:11 +01:00
Florian Hahn ac60263ad1
[MemoryLocation] Handle memset_pattern{4,8,16} in getForDest.
memset_pattern{4,8,16} writes to the first argument. Use getForDest
to return the corresponding MemoryLocation.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114906
2021-12-14 14:41:28 +00:00
Kazu Hirata d2377f24e1 Ensure newlines at the end of files (NFC) 2021-12-12 11:04:44 -08:00
Nikita Popov 9932d4db0d [SCEV] Fix unused variable warning (NFC) 2021-12-11 21:03:54 +01:00
Mircea Trofin 04f2712ef4 [NFC][MLGO] Factor ModelUnderTrainingRunner for reuse
This is so we may reuse it. It was very non-inliner specific already.

Differential Revision: https://reviews.llvm.org/D115465
2021-12-10 11:24:15 -08:00
Nikita Popov 65bec04295 [ConstantFold] Handle same type in ConstantFoldLoadThroughBitcast
Usually the case where the types are the same ends up being handled
fine because it's legal to do a trivial bitcast to the same type.
However, this is not true for aggregate types. Short-circuit the
whole code if the types match exactly to account for this.
2021-12-10 16:39:50 +01:00
Sameer Sahasrabuddhe 1d0244aed7 Reapply CycleInfo: Introduce cycles as a generalization of loops
Reverts 02940d6d22. Fixes breakage in the modules build.

LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-10 14:36:43 +05:30
Hasyimi Bahrudin c1cd698a52 [InstSimplify] Simplify bool icmp with not in LHS
Refer to https://llvm.org/PR52546.

Simplifies the following cases:
    not(X) == 0 -> X != 0 -> X
    not(X) <=u 0 -> X >u 0 -> X
    not(X) >=s 0 -> X <s 0 -> X
    not(X) != 1 -> X == 1 -> X
    not(X) <=u 1 -> X >=u 1 -> X
    not(X) >s 1 -> X <=s -1 -> X

Differential Revision: https://reviews.llvm.org/D114666
2021-12-09 16:26:46 -05:00
Arthur Eubanks 1172712f46 [NFC] Replace some deprecated getAlignment() calls with getAlign()
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D115370
2021-12-09 08:43:19 -08:00
Nikita Popov 3beafecedf [InlineAdvisor] Remove outdated comment (NFC)
This just returns None nowadays, so this comment doesn't apply
anymore.
2021-12-09 15:11:56 +01:00
Florian Hahn d74a8a78ad
[LV] Mark various functions as const (NFC).
Make sure various accessors do not modify any state, in preparation for
D115111.
2021-12-09 10:51:29 +00:00
Mircea Trofin 059e03476c [NFC][mlgo] Generalize model runner interface
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.

Differential Revision: https://reviews.llvm.org/D115306
2021-12-08 20:10:58 -08:00
Florian Hahn 3c55acc4a6
[MemoryLocation] Support memset_pattern{4,8} in getForArgument.
memset_pattern{4,8} behave as memset_pattern16, with the only difference
being the size of the pattern location.

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114905
2021-12-08 19:39:45 +00:00
Jolanta Jensen 77b2bb5567 [LAA] Use type sizes when determining dependence.
In the isDependence function the code does not try hard enough
to determine the dependence between types. If the types are
different it simply gives up, whereas in fact what we really
care about are the type sizes. I've changed the code to compare
sizes instead of types.

Reviewed By: fhahn, sdesmalen

Differential Revision: https://reviews.llvm.org/D108763
2021-12-08 15:00:58 +00:00
James Farrell 219672b8dd Revert "Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.""
This reverts commit 63a6348cad.

Differential Revision: https://reviews.llvm.org/D115254
2021-12-07 23:15:21 +00:00
Jonas Devlieghere 02940d6d22 Revert "CycleInfo: Introduce cycles as a generalization of loops"
This reverts commit 0fe61ecc2c because it
breaks the modules build.

https://green.lab.llvm.org/green/job/clang-stage2-rthinlto/4858/
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39112/
2021-12-07 13:06:34 -08:00
Sanjay Patel 8a69b04478 [InstSimplify] add logic fold for 'or' with 'xor'+'and'
This replaces the 'or' from 4b30076f16 with an 'and'.
We have to guard against propagating undef elements from
vector 'not' values:
https://alive2.llvm.org/ce/z/irMwRc
2021-12-07 11:08:26 -05:00
Cullen Rhodes 0395e01583 [IR] Split vscale_range interface
Interface is split from:

  std::pair<unsigned, unsigned> getVScaleRangeArgs()

into separate functions for min/max:

  unsigned getVScaleRangeMin();
  Optional<unsigned> getVScaleRangeMax();

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114075
2021-12-07 10:38:26 +00:00
Sameer Sahasrabuddhe 0fe61ecc2c CycleInfo: Introduce cycles as a generalization of loops
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-07 12:02:34 +05:30
James Farrell 63a6348cad Revert "Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."
This reverts commit 5032467034.
2021-12-06 17:35:26 +00:00
Bardia Mahjour dfcfd14070 [VP] getVPMemoryOpCost interface
Added TTI queries for the cost of a VP Memory operation, and added Opcode,
DataType and Alignment to the hasActiveVectorLength() interface.

Reviewed By: Roland Froese

Differential Revision: https://reviews.llvm.org/D109416
2021-12-06 11:27:07 -05:00
James Farrell 5032467034 Use VersionTuple for parsing versions in Triple, fixing issues that caused the original change to be reverted. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
This reverts commit 40d5eeac6c.

Differential Revision: https://reviews.llvm.org/D114885
2021-12-06 14:57:47 +00:00
Kazu Hirata 1457e78352 [llvm] Use range-based for loops (NFC) 2021-12-05 08:33:02 -08:00
Sanjay Patel c65e651e60 [InstSimplify] fix logic fold of 'or' for vectors
Reduce code duplication for commutative pattern matching
and fix a miscompile.

We can't safely propagate an undef element in this transform:
https://alive2.llvm.org/ce/z/s5xy55
2021-12-05 09:57:07 -05:00
Florian Hahn 203f29b40c
[MemoryLocation] Use getForArgument in getForSource/getForDest. (NFC)
getForArgument already knows how to extract a memory location for all
memory intrinsics. Use it instead of duplicating the logic.
2021-12-05 11:13:14 +00:00
Florian Hahn a9125792b3
[MemoryLocation] Support missing atomic intrinsics in getForArg.
getForArgument is missing support for atomic memory transfer
intrinsics. In terms of accessed locations they behave like regular
memory transfer intrinsics and we already support them as such in
getForSource/getForDest.
2021-12-04 22:18:39 +00:00
Mehrnoosh Heidarpour e94134052f [InstSimplify] Add logic 'or' fold to -1
Adding the following folding opportunity:
(~A | B) | (A ^ B) --> -1

https://alive2.llvm.org/ce/z/PMtdYB

Differential revision: https://reviews.llvm.org/D114996
2021-12-04 15:04:18 -05:00
Florian Hahn ead3979a92
[MemoryLocation] Move DSE intrinsic handling to MemoryLocation. (NFC)
Suggested in D114872.
2021-12-03 16:00:39 +00:00
David Green ab0c5cea0b [ARM] Use v2i1 for MVE and CDE intrinsics
This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal
type, to use a <2 x i1> as opposed to emulating the predicate with a
<4 x i1>. The v4i1 workarounds have been removed leaving the natural
v2i1 types, notably in vctp64 which now generates a v2i1 type.

AutoUpgrade code has been added to upgrade old IR, which needs to
convert the old v4i1 to a v2i1 be converting it back and forth to an
integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be
optimized away in the final assembly.

Differential Revision: https://reviews.llvm.org/D114455
2021-12-03 15:27:58 +00:00
Florian Hahn af86aa7980
[MemoryLocation] Use None instead of {}. (NFC) 2021-12-03 13:19:00 +00:00
Florian Hahn f078536f46
[MemoryLocation] Move DSE's logic to new MemLoc::getForDest helper (NFC).
DSE has some extra logic to determine the write location of library
calls like str*cpy and str*cat. This patch moves the logic to a new
MemoryLocation:getForDest variant, which takes a call and TLI.

This patch should be NFC, because no other places take advantage of the
new helper yet.

Suggested by @reames post-commit 7eec832def.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114872
2021-12-03 09:12:01 +00:00
Nikita Popov 49d040ac97 [SCEV] Fix ValuesAtScopesUsers consistency
Fixes verification failure reported at:
https://reviews.llvm.org/rGc9f9be0381d1

The issue is that getSCEVAtScope() might compute a result without
inserting it in the ValuesAtScopes map in degenerate cases,
specifically if the ValuesAtScopes entry is invalidated during the
calculation. Arguably we should still insert the result if no
existing placeholder is found, but for now just tweak the logic
to only update ValuesAtScopesUsers if ValuesAtScopes is updated.
2021-12-03 10:03:10 +01:00
Florian Hahn 829b29b619
[MemoryLocation] strcat/strncat/strcpy read/write after their args.
strcpy/strcat/strncat access memory starting from the passed in
pointers. Construct memory locations for their args using getAfter.

Discussed in D114872.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D114969
2021-12-03 08:48:23 +00:00
Florian Hahn 639a78a4bf
[MemoryLocation] Support strncpy in getForArgument.
The size argument of strncpy can be used as bound for the size of
its pointer arguments.

strncpy is guaranteed to write N bytes and reads up to N bytes.

Reviewed By: xbolva00

Differential Revision: https://reviews.llvm.org/D114871
2021-12-02 14:18:05 +00:00
Sanjay Patel 97e921c81f [PatternMatch] create and use matcher for 'not' that excludes undef elements
We needed a stricter version of m_Not for D114462, but I wasn't
sure if that was going to be required anywhere else, so I didn't bother
to make that reusable.

It turns out we have one more existing simplification that needs
this (currently miscompiles):
https://alive2.llvm.org/ce/z/9-nTKi

And there's at least one more fold in that family that we could add.

Differential Revision: https://reviews.llvm.org/D114882
2021-12-02 08:51:13 -05:00
Florian Hahn 9f9e8ba114
[MemoryLocation] Support memset_chk in getForArgument.
The size argument for memset_chk is an upper bound for the size of the
pointer argument. memset_chk may write less than the specified length,
if it exceeds the specified max size and aborts.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114870
2021-12-02 13:45:58 +00:00
Florian Hahn ad88a37cea
[TLI] Add memset_pattern4, memset_pattern8 lib functions.
Similar to memset_pattern16, memset_pattern4, memset_pattern8 are
available on Darwin platforms.

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114881
2021-12-01 21:18:19 +00:00
Nikita Popov 67704801c6 [SCEV] Track backedge taken count users (NFCI)
Track which SCEVs are used as ExactNotTaken counts in
BackedgeTakenInfo structures, so we can directly determine which
loops need to be invalidated, rather than iterating over all BECounts.

This gives a small compile-time improvement on average, but the
motivation here is more to ensure there are no degenerate cases,
if the number of backedge taken counts is large.

Differential Revision: https://reviews.llvm.org/D114784
2021-12-01 10:16:47 +01:00
Nikita Popov c9f9be0381 [SCEV] Verify integrity of ValuesAtScopes and users (NFC)
Make sure that ValuesAtScopes and ValuesAtScopesUsers are
consistent during SCEV verification.
2021-11-30 21:08:40 +01:00
Sanjay Patel 4b30076f16 [InstSimplify] add logic fold for 'or'
https://alive2.llvm.org/ce/z/4PaPDy

There's a related fold where the inner 'or' is replaced by 'and',
but that needs to be more careful about matching a 'not'.
2021-11-30 14:08:54 -05:00
Sanjay Patel c49ef1448d [InstSimplify] reduce code duplication for 'or' logic folds; NFC 2021-11-30 14:08:54 -05:00
Sanjay Patel 7a7c059d86 [InstSimplify] reduce code duplication for 'or' logic fold; NFC 2021-11-30 12:55:37 -05:00
Sanjay Patel 8dec0b23da [InstSimplify] refactor 'or' logic folds; NFC
Reduce duplication for handling the top-level commuted operands.
There are several other folds that should be moved in here, but
we need to make sure there's good test coverage.
2021-11-30 12:55:36 -05:00
Nikita Popov 40d5eeac6c Revert "Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible."
This reverts commit 1e82864670.

llvm/test/Transforms/LoopStrengthReduce/X86/2009-11-10-LSRCrash.ll fails
with assertion failure:

llc: /home/nikic/llvm-project/llvm/include/llvm/ADT/Optional.h:196: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = unsigned int]: Assertion `hasVal' failed.
...
 #8 0x00005633843af5cb llvm::MCStreamer::emitVersionForTarget(llvm::Triple const&, llvm::VersionTuple const&)
 #9 0x0000563383b47f14 llvm::AsmPrinter::doInitialization(llvm::Module&)
2021-11-30 18:36:32 +01:00
kpyzhov a356dae74c [RegionPass] Added check for -filter-print-funcs option to the region IR dumps.
Differential Revision: https://reviews.llvm.org/D114310
2021-11-30 12:30:15 -05:00
Nikita Popov 37d72991c1 [SCEV] Track and invalidate ValuesAtScopes users
ValuesAtScopes maps a SCEV and a Loop to another SCEV. While we
invalidate entries if the left-hand SCEV is invalidated, we
currently don't do this for the right-hand SCEV. Fix this by
tracking users in a reverse map and using it for invalidation.

This is conceptually the same change as D114738, but using the
reverse map to avoid performance issues.

Differential Revision: https://reviews.llvm.org/D114788
2021-11-30 18:21:14 +01:00
James Farrell 1e82864670 Use VersionTuple for parsing versions in Triple. This makes it possible to distinguish between "16" and "16.0" after parsing, which previously was not possible.
See also https://github.com/android/ndk/issues/1455.

Differential Revision: https://reviews.llvm.org/D114163
2021-11-30 15:44:23 +00:00
Nikita Popov 77dd579827 [SCEV] Remove incorrect assert
Fix assertion failure reported on D113349 by removing the assert.
While the produced expression should be equivalent, it may not
be strictly the same, e.g. due to lazy nowrap flag updates. Similar
to what the main createSCEV() code does, simply retain the old
value map entry if one already exists.
2021-11-29 17:09:12 +01:00
Florian Hahn 7b75110fac
[SCEV] Turn validity check in getExistingSCEV into assert (NFC).
Now that we track users of SCEV expressions, we should be able to always
invalidate containing expressions.

With that, I think the case where a value gets removed but
SCEVs containing references to it should not be possible any longer.
Turn check into an assert.

This slightly reduces compile-time:

NewPM-O3: -0.27%
NewPM-ReleaseThinLTO: -0.21%
NewPM-ReleaseLTO-g: -0.26%

http://llvm-compile-time-tracker.com/compare.php?from=c3dc6b081da6ba503e67d260033f81f61eb38ea3&to=95a4a028b1f1dd0bc3d221435953b7d2c031b3d5&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114633
2021-11-28 12:16:55 +00:00
Nikita Popov f492a414ba [SCEV] Simplify forgetSymbolicName() (NFCI)
With the recently introduced tracking as well as D113349, we can
greatly simplify forgetSymbolicName(). In fact, we can simply
replace it with forgetMemoizedResults().

What forgetSymbolicName() used to do is to walk the IR use-def
chain to find all SCEVs that mention the SymbolicName. However,
thanks to use tracking, we can now determine the relevant SCEVs
in a more direct way. D113349 is needed to also clear out the
actual IR to SCEV mapping in ValueExprMap.

Differential Revision: https://reviews.llvm.org/D114263
2021-11-27 16:42:38 +01:00
Nikita Popov c2550e3427 [SCEV] Simplify invalidation after BE count calculation (NFCI)
After backedge taken counts have been calculated, we want to
invalidate all addrecs and dependent expressions in the loop,
because we might compute better results with the newly available
backedge taken counts. Previously this was done with a forgetLoop()
style use-def walk. With recent improvements to SCEV invalidation,
we can instead directly invalidate any SCEVs using addrecs in this
loop. This requires a great deal less subtlety to avoid invalidating
more than necessary, and in particular gets rid of the hack from
D113349. The change is similar to D114263 in spirit.
2021-11-27 16:35:06 +01:00