Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).
The change renames the type to a more generic name, and disables
implicit constructors.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72744
This patch imports constant variables even when they can't be internalized
(which results in promotion). This offers some extra constant folding
opportunities.
Differential revision: https://reviews.llvm.org/D70404
If addrecexpr has nuw flag, the value should never be less than its
start value and start value does not required to be SCEVConstant.
Reviewed By: nikic, sanjoy
Differential Revision: https://reviews.llvm.org/D71690
Summary:
The goal is to simplify experimentation on the cost model. Today,
CallAnalyzer decides 2 things: legality, and benefit. The refactoring
keeps legality assessment in CallAnalyzer, and factors benefit
evaluation out, as an extension.
Reviewers: davidxl, eraman
Reviewed By: davidxl
Subscribers: kamleshbhalui, fedor.sergeev, hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71733
Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same
semantics as a sub expression. This allows us to query hardware-loops, which
contain this @loop.decrement.reg intrinsic, so that we can calculate iteration
counts, exit values, etc. of hardwareloops.
This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus,
while hardware-loops and tripcounts now become analysable by SCEV, this
prevents the usual loop transformations from applying transformations on
hardware-loops, which is what we want at this point, for which I have added
test cases for loopunrolling and IndVarSimplify and LFTR.
Differential Revision: https://reviews.llvm.org/D71563
This is step 1 of damage control assuming that we need to remove several
over-reaching folds for select-of-booleans because they can cause
miscompiles as shown in D72396.
The scalar case seems obviously safe:
https://rise4fun.com/Alive/jSj
And I don't think there's any danger for vectors either - if the
condition is poisoned, then the select must be poisoned too, so undef
elements don't make any difference.
Differential Revision: https://reviews.llvm.org/D72412
Summary:
The goal is to simplify experimentation on the cost model. Today,
CallAnalyzer decides 2 things: legality, and benefit. The refactoring
keeps legality assessment in CallAnalyzer, and factors benefit
evaluation out, as an extension.
Reviewers: davidxl, eraman
Subscribers: kamleshbhalui, fedor.sergeev, hiraditya, baloghadamsoftware, haicheng, a.sidorin, Szelethus, donat.nagy, dkrupp, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71733
Our out-of-tree custom aliasing solution for the HPC# Burst compiler
here at Unity makes use of the `ExternalAAwrapperPass` infrastructure to
insert our custom aliasing resolution into the core of LLVM. This is
great for all cases except for function inlining, where because
`createLegacyPMAAResults` does not make use of `ExternalAAWrapperPass`,
when we have a definite no-alias result within a function it won't be
propagated to the calling function during inlining.
This commit just rectifies this oversight by adding the missing
dependency.
Differential Revision: https://reviews.llvm.org/D71348
Summary:
Make `AAMDNodes`' `getAAMetadata()` and `setAAMetadata()` to take `!tbaa.struct`
into account as well as `!tbaa`. This impacts llvm.org/pr42022.
This is a temprorary fix needed to keep `!tbaa.struct` tag by SROA pass.
New field `TBAAStruct` should be deleted when `!tbaa` tag replaces `!tbaa.struct`.
Merging two `!tbaa.struct`'s to one is conservatively considered to be `nullptr`
(giving `MayAlias`) -- this could be enhanced, but relying on the said future
replacement.
Reviewers: RKSimon, spatel, vporpo
Subscribers: hiraditya, kosarev, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70924
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
Summary:
[DA] Move common code in checkSrcSubscript and checkDstSubscript to a
new function checkSubscript. This avoids duplicate code and possible
out of sync in the future.
Reviewers: sebpop, jmolloy, reames
Reviewed By: sebpop
Subscribers: bmahjour, hiraditya, llvm-commits, amehsan
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71087
Patch by zhongduo.
Summary:
All the use cases of CallAnalyzer use the same call site parameter to
both construct the CallAnalyzer, and then pass to the analysis member.
This change removes this duplication.
Reviewers: davidxl, eraman, Jim
Reviewed By: davidxl
Subscribers: Jim, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71645
Summary:
It is pretty common to assume that something is not zero.
Even optimizer itself sometimes emits such assumptions
(e.g. `addAssumeNonNull()` in `PromoteMemoryToRegister.cpp`).
But we currently don't deal with such assumptions :)
The only way `isKnownNonZero()` handles assumptions is
by calling `computeKnownBits()` which calls `computeKnownBitsFromAssume()`.
But `x != 0` does not tell us anything about set bits,
it only says that there are *some* set bits.
So naturally, `KnownBits` does not get populated,
and we fail to make use of this assumption.
I propose to deal with this special case by special-casing it
via adding a `isKnownNonZeroFromAssume()` that returns boolean
when there is an applicable assumption.
While there, we also deal with other predicates,
mainly if the comparison is with constant.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43267 | PR43267 ]].
Differential Revision: https://reviews.llvm.org/D71660
This is a pretty rare case, when CxtI and assume are
in the same basic block, with assume being located later.
We were already checking that assumption was guaranteed to be
executed, but we omitted CxtI itself from consideration,
and as the test (miscompile) shows, that is incorrect.
As noted in D71660 review by @nikic.
Summary:
This patch associates ordinal numbers to the DDG Nodes allowing
the builder to order nodes within a pi-block in program order. The
algorithm works by simply assuming the order in which the BBList
is fed into the builder. The builder already relies on the blocks being
in program order so that it can compute the dependencies correctly.
Similarly the order of instructions in their parent basic blocks
determine their program order.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70986
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.
Differential Revision: https://reviews.llvm.org/D71610
Summary:
Follow-on to D66428 and D71193, to build the TLI per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
With D71193, the -fno-builtin* flags are converted to function
attributes, so we can now set this information per-function on the TLI.
In this patch, the TLI constructor is changed to take a Function, which
can be used to override the available builtins. The TLI is augmented
with an array that can be used to specify which builtins are not
available for the corresponding function. The available function checks
are changed to consult this override before checking the underlying
module level baseline TLII. New code is added to set this override
array based on the attributes.
I also removed the code that sets availability in the TLII in clang from
the options, which is no longer needed.
I removed a per-Triple caching of TLII objects in the analysis object,
as it is based on the Module's Triple which is the same for all
functions in any case. Is there a case where we would be compiling
multiple Modules with different Triples in one compilation?
Finally, I have changed the legacy analysis wrapper to create and use
the new PM analysis class (TargetLibraryAnalysis) in getTLI. This is
consistent with the behavior of getTTI for the legacy
TargetTransformInfo analysis. This change means that getTLI now creates
a new TLI on each call (although that should be very cheap as we cache
the module level TLII, and computing the per-function
attribute based availability should also be reasonably efficient).
I measured the compile time for a large C++ file with tens of thousands
of functions and as expected there was no increase.
Reviewers: chandlerc, hfinkel, gchatelet
Subscribers: mehdi_amini, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67923
shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>
This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.
The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.
This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.
Differential Revision: https://reviews.llvm.org/D71488
This reverts commit 0be81968a2.
The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
This fixes the buildbot failures.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
This is a rebase of the change over D70376, which fixes an LVI cache
invalidation issue that also affected this patch.
-----
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914
Removed code duplication in ThreadCmpOverSelect and broke it
into several smaller functions for reusing them.
Differential Revision: https://reviews.llvm.org/D71158
This patch renames the LoopInfo::isRotated() method to LoopInfo::isRotatedForm()
to make it clear that the method checks whether the loop is in rotated form, not
whether the loop has been rotated by the LoopRotation pass.
In order to use assumptions, computeKnownBits needs a context
instruction. We can use the GEP, if it is an instruction. We already
pass the assumption cache, but it cannot be used without a context
instruction.
Reviewers: anemet, asbirlea, hfinkel, spatel
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D71264
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.
Differential Revision: https://reviews.llvm.org/D71177
Summary:
The current da printer shows the dependence without indicating
which instructions are being considered as the src vs dst. It
also silently ignores call instructions, despite the fact that
they create confused dependence edges to other memory
instructions. This patch addresses these two issues plus a
couple of minor non-functional improvements.
Authored By: bmahjour
Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc
Reviewed By: dmgreen, fhahn
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71088
Summary:
This patch adds a method to determine if a loop is in rotated form (the latch is
an exiting block). It also modifies the getLoopGuardBranch method to use this
new method. This method can also be used in Loopfusion. Once this patch lands I
will make the corresponding changes there.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65958
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]
In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.
The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...) the system should prefer, when multiple versions with the
same VFShape are present.
Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.
[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.
Differential Revision: https://reviews.llvm.org/D67572
Summary:
Same as D60846 and D69571 but with a fix for the problem encountered
after them. Both times it was a missing context adjustment in the
handling of PHI nodes.
The reproducers created from the bugs that caused the old commits to be
reverted are included.
Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans
Subscribers: hiraditya, bollu, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71181
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
This is another transform suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
The backend for some targets already manages to get
this if it converts copysign to bitwise logic.
Summary:
This patch introduces an API to build and modify vector shapes.
The validity of a VFShape can be checked with the
`hasValidParameterList` method, which is also run in an assertion each
time a VFShape is modified.
The field VFISAKind has been moved to VFInfo under the assumption that
different ISAs can map to the same VFShape (as it can be in the case
of vector extensions with the same registers size, for example AVX and
AVX2).
Reviewers: sdesmalen, jdoerfert, simoll, hsaito
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70513
Variant on D70103. The caching is switched to always use a BB to
cache entry map, which then contains per-value caches. A separate
set contains value handles with a deletion callback. This allows us
to properly invalidate overdefined values.
A possible alternative would be to always cache by value first and
have per-BB maps/sets in the each cache entry. In that case we could
use a ValueMap and would avoid the separate value handle set. I went
with the BB indexing at the top level to make it easier to integrate
D69914, but possibly that's not the right choice.
Differential Revision: https://reviews.llvm.org/D70376