This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
So i would think this is pretty uncontroversial.
On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name | baseline | proposed | Δ | % | \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE | 0 | 23779 | 23779 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942328 | 7942392 | 64 | 0.00% | 0.00% |
| assembler.ObjectBytes | 273069192 | 273084704 | 15512 | 0.01% | 0.01% |
| correlated-value-propagation.NumPhis | 18412 | 18539 | 127 | 0.69% | 0.69% |
| early-cse.NumCSE | 2183283 | 2183227 | -56 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 542090 | -8015 | -1.46% | 1.46% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640264 | 3664769 | 24505 | 0.67% | 0.67% |
| instcombine.NumDeadInst | 1778193 | 1783183 | 4990 | 0.28% | 0.28% |
| instcount.NumCallInst | 1758401 | 1758799 | 398 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330533 | -24 | -0.01% | 0.01% |
| instcount.TotalInsts | 8831952 | 8832286 | 334 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019808 | 999607 | -20201 | -1.98% | 1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.
That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..
I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.
Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.
As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
This reverts commit 8d5f64c4ed.
Thanks to Eli Friedma for pointing out that this check is not appropiate here,
this check will be moved to the Lint pass.
As FIXME said, they really should be checking for a single user,
not use, so let's do that. It is not *that* unusual to have
the same value as incoming value in a PHI node, not unlike
how a PHI may have the same incoming basic block more than once.
There isn't a nice way to do that, Value::users() isn't uniqified,
and Value only tracks it's uses, not Users, so the check is
potentially costly since it does indeed potentially involes
traversing the entire use list of a value.
This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.
Differential Revision: https://reviews.llvm.org/D86301
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with
`llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an
ordering function for the `ElementCount` class.
* Bits and pieces around printing the `ElementCount` to string streams.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Note that in some places the idiom
```
unsigned VF = ...
auto VTy = FixedVectorType::get(ScalarTy, VF)
```
has been replaced with
```
ElementCount VF = ...
assert(!VF.Scalable && ...);
auto VTy = VectorType::get(ScalarTy, VF)
```
The assertion guarantees that the new code is (at least in debug mode)
functionally equivalent to the old version. Notice that this change had been
possible because none of the methods that are specific to `FixedVectorType`
were used after the instantiation of `VTy`.
Reviewed By: rengolin, ctetreau
Differential Revision: https://reviews.llvm.org/D85794
Changes:
* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.
To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:
1. When it doesn't make sense to deal with the scalable property, for
example:
a. When computing unrolling factors.
b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.
Differential Revision: https://reviews.llvm.org/D85794
This patch adds support for representing Fortran `character(n)`.
Primarily patch is based out of D54114 with appropriate modifications.
Test case IR is generated using our downstream classic-flang. We're in process
of upstreaming flang PR's but classic-flang has dependencies on llvm, so
this has to get in first.
Patch includes functional test case for both IR and corresponding
dwarf, furthermore it has been manually tested as well using GDB.
Source snippet:
```
program assumedLength
call sub('Hello')
call sub('Goodbye')
contains
subroutine sub(string)
implicit none
character(len=*), intent(in) :: string
print *, string
end subroutine sub
end program assumedLength
```
GDB:
```
(gdb) ptype string
type = character (5)
(gdb) p string
$1 = 'Hello'
```
Reviewed By: aprantl, schweitz
Differential Revision: https://reviews.llvm.org/D86305
Extend the `applyUpdates` in DominatorTree to allow a post CFG view,
different from the current CFG.
This patch implements the functionality of updating an already up to
date DT, to the desired PostCFGView.
Combining a set of updates towards an up to date DT and a PostCFGView is
not yet supported.
Differential Revision: https://reviews.llvm.org/D85472
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.
Also the CL removes duplicated Values in gc-live bundle.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
Currently ConstantExpr::getWithOperands does not handle FNeg and
subsequently treats FNeg as binary operator, leading to an assertion
failure or segmentation fault if built without assertions.
Originally I reproduced this with llvm-dis on a bitcode file, which I
unfortunately cannot share and also cannot really reduce.
But PR45426 describes the same issue and has a reproducer with Clang, so
I'll go with that.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86274
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D85650
This code becomes dead for valid IR after 48f4312 and a96fc46. The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order. By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports. And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.
This is an extension to a96fc46. "gc-live" hadn't been implemented at the point that patch was initially posted.
(Forgot to land this a couple of weeks back.)
In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.
The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.
Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.
Differential Revision: https://reviews.llvm.org/D80892
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D85269
This patch restricts the behaviour of referencing via .Lfoo$local
local aliases, introduced in https://reviews.llvm.org/D73230, to
STV_DEFAULT globals only.
Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility)
is an important scenario.
Benefits:
- Improves the size of object files by using fewer STT_SECTION symbols.
- The code reads a bit better (it was not obvious to me without going
back to the code reviews why the canBenefitFromLocalAlias function
currently doesn't consider visibility).
- There is also a side benefit in restoring the effectiveness of the
--wrap linker option and making the behavior of --wrap consistent
between LTO and normal builds for references within a translation-unit.
Note: this --wrap behavior (which is specific to LLD) should not be
considered reliable. See comments on https://reviews.llvm.org/D73230
for more.
Differential Revision: https://reviews.llvm.org/D85782
Previously ConstantFoldExtractElementInstruction() would only work with
insertelement instructions, not contants. This properly handles
insertelement constants as well.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85865
This recommits the following patches now that D85684 has landed
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.
Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D80735
The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.
Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.
A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.
Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.
Our evaluation shows that
1) The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
2) For targets with LTO time > 20min, the time reduction is about 20%.
3) No observable performance impact for build without using LTO.
{F12506218}
{F12506221}
Reviewed By: mehdi_amini, tejohnson, jdoerfert
Differential Revision: https://reviews.llvm.org/D85379
As noted on PR46885, the number of mask elements should always be a power of 2, so to fix the static analyzer warning we are better off replacing the condition to <= 4, and I've added a pow2 assertion as well.
`DenseMapAPIntKeyInfo` is now located in `lib/IR/LLVMContextImpl.h`.
Moved it into `include/ADT/DenseMapInfo.h` to use it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85131
As discussed on D81500, this adds a more general ElementCount variant of the build helper and converts the (non-scalable) unsigned NumElts variant to use it internally.
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83283
I found that propagateAttributes was ~23% of a thin link's run time
(almost 4x higher than the second hottest function). The main reason is
that it re-examines a global var each time it is referenced. This
becomes unnecessary once it is marked both non read only and non write
only. I added a set to avoid doing redundant work, which dropped the
runtime of that thin link by almost 15%.
I made a smaller efficiency improvement (no measurable impact) to skip
all summaries for a VI if the first copy is dead. I added an assert to
ensure that all copies are dead if any is. The code in
computeDeadSymbols marks all summaries for a VI as live. There is one
corner case where it was skipping marking an alias as live, that I
fixed. However, since the code earlier marked all copies of a preserved
GUID's VI as live, and each 'visit' marks all copies live, the only case
where this could make a difference is summaries that were marked live
when they were built initially, and that is only a few special compiler
generated symbols and inline assembly symbols, so it likely is never
provoked in practice.
Differential Revision: https://reviews.llvm.org/D84985
Pass the abs poison flag to the underlying ConstantRange
implementation, allowing CVP to simplify based on it.
Importantly, this recognizes that abs with poison flag is actually
non-negative...
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
ConstantRange, not any consumers.
Differential Revision: https://reviews.llvm.org/D84587
All analysis are preserved if there's no local change, and thanks to
3667d87a33 this property is enforced for all
passes.
Skipping the dependency computation improves the performance when there's a lot
of small functions, where only a few change happen.
Thanks to Nikita Popov who provided this numbers (extract below)
https://llvm-compile-time-tracker.com/compare.php?from=183342c0a9850e60dd7a004b651c83dfb3a7d25e&to=f2f91e6a2743070471cc9471e4e8c646e50c653c&stat=instructions
O3: (number of instructions)
Benchmark Old New
kimwitu++ 60783M 59968M (-1.34%)
sqlite3 73200M 73083M (-0.16%)
consumer-typeset 52776M 52712M (-0.12%)
Bullet 133709M 132940M (-0.58%)
tramp3d-v4 123864M 123186M (-0.55%)
mafft 55534M 55477M (-0.10%)
ClamAV 76292M 76164M (-0.17%)
lencod 103190M 103061M (-0.13%)
SPASS 64068M 63713M (-0.55%)
7zip 197332M 196308M (-0.52%)
geomean 85750M 85389M (-0.42%)
Differential Revision: https://reviews.llvm.org/D80707
This is the part of the patch that's moving the Updates to a CFGDiff
object. Splitting off from the clean-up work merging the two branches when BUI is null.
Differential Revision: https://reviews.llvm.org/D77341
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D83976
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
The new implementation makes it clear that there are exactly two
conditional stores (after the initial no-op optimization). By contrast
the old implementation had seven conditionals, some hidden inside other
functions.
This commit can change the order of operands in operand lists, hence the
tweak to one test case.
Differential Revision: https://reviews.llvm.org/D80116
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.
for pointer array (before allocation/association)
without DW_AT_associated
(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_associated
(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>
for allocatable array (before allocation)
without DW_AT_allocated
(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size
with DW_AT_allocated
(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>
Testing
- unit test cases added
- check-llvm
- check-debuginfo
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83544
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
As requested by Andrew Kaylor, rewrite this code in a way that does
not warn on old MSVC versions.
Avoid the buggy constexpr warning by just not using constexpr and
removing the static_assert that depends on it.
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
If no alignment is specified we try to find the datalayout by using the insert position to get the module so we can get the datalayout. But if those are null, then we deference a null pointer.
This patch adds asserts to make the failure a little more obvious than just seg faulting.
Differential Revision: https://reviews.llvm.org/D83829
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.
A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.
Reviewers: jdoerfert
Subscribers: hiraditya, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83370
This restores commit 80d0a137a5, and the
follow on fix in 873c0d0786, with a new
fix for test failures after a 2-stage clang bootstrap, and a more robust
fix for the Chromium build failure that an earlier version partially
fixed. See also discussion on D75201.
Reviewers: evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
This changes the matrix load/store intrinsic definitions to load/store from/to
a pointer, and not from/to a pointer to a vector, as discussed in D83477.
This also includes the recommit of "[Matrix] Tighten LangRef definitions and
Verifier checks" which adds improved language reference descriptions of the
matrix intrinsics and verifier checks.
Differential Revision: https://reviews.llvm.org/D83785
The approach is simple: if a pass reports that it's not modifying a
Function/Module, compute a loose hash of that Function/Module and compare it
with the original one. If we report no change but there's a hash change, then we
have an error.
This approach misses a lot of change but it's not super intrusive and can
detect most of the simple mistakes.
Differential Revision: https://reviews.llvm.org/D80916
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This tightens the matrix intrinsic definitions in LLVM LangRef and adds
correspondings checks to the IR Verifier.
Differential Revision: https://reviews.llvm.org/D83477
PassInfoMixin should be used for all NPM passes, rater than a custom
`name()`.
This caused ambiguous references in LegacyPassManager.cpp, so had to
remove "using namespace llvm::legacy" and move some things around.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D83498
PassInfoMixin should be used for all NPM passes, rater than a custom
`name()`.
This caused ambiguous references in LegacyPassManager.cpp, so had to
remove "using namespace llvm::legacy" and move some things around.
The passes had to be moved to the llvm namespace, or else they would get
printed as "(anonymous namespace)::FooPass".
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D83498
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.
A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.
Reviewers: jdoerfert
Subscribers: hiraditya, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83370
Since none of these users really care about the actual type, hide the
type under a new size-getting attribute to go along with
hasPassPointeeByValueAttr. This will work better for the future byref
attribute, which may end up only tracking the byte size and not the IR
type.
We currently have 3 parameter attributes that should carry the type
(technically inalloca does not yet). The APIs are somewhat awkward
since preallocated/inalloca piggyback on byval in some places, but in
others are treated as distinct attributes. Since these are all
mutually exclusive, we should probably just merge all the attribute
infrastructure treating these as totally distinct attributes.
The `noundef` attribute indicates an argument or return value which
may never have an undef value representation.
This patch allows LLVM to parse the attribute.
Differential Revision: https://reviews.llvm.org/D83412
This cleans up the stack allocated by a @llvm.call.preallocated.setup.
Should either call the teardown or the preallocated call to clean up the
stack. Calling both is UB.
Add LangRef.
Add verifier check that the token argument is a @llvm.call.preallocated.setup.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D83354
The approach is simple: if a pass reports that it's not modifying a
Function/Module, compute a loose hash of that Function/Module and compare it
with the original one. If we report no change but there's a hash change, then we
have an error.
This approach misses a lot of change but it's not super intrusive and can
detect most of the simple mistakes.
Differential Revision: https://reviews.llvm.org/D80916
Summary:
Make Constant::getSplatValue recognize scalable vector splats of the
form created by ConstantVector::getSplat. Add unit test to verify that
C == ConstantVector::getSplat(C)->getSplatValue() for fixed width and
scalable vector splats
Reviewers: efriedma, spatel, fpetrogalli, c-rhodes
Reviewed By: efriedma
Subscribers: sdesmalen, tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82416
Summary:
It is reasonably common to want to clone some call with different bundles.
Let's actually provide an interface to do that.
Reviewers: chandlerc, jdoerfert, dblaikie, nickdesaulniers
Reviewed By: nickdesaulniers
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83248
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.
Summary:
D80831 changed part of the prefix usage for AIX.
But there are other places getting prefix from DataLayout.
This patch intends to make prefix usage consistent on AIX.
Reviewed by: hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D81270
As per [MaxAlignmentExponent]{b7338fb1a6/llvm/include/llvm/IR/Value.h (L688)} alignment is not allowed to be more than 2^29.
Encoded as Log2, this means that storing alignment uses 5 bits.
This patch makes sure all instructions store their alignment in a consistent way, encoded as Log2 and using 5 bits.
Differential Revision: https://reviews.llvm.org/D83119
It's messy to pattern-match, and completely unnecessary: scalar indexes
work equally well.
See also discussion on D81620 and D82061.
Differential Revision: https://reviews.llvm.org/D82430
In most cases, this doesn't have much impact: the destructors just call
the base class destructor anyway. A few subclasses of ConstantExpr
actually store non-trivial data, though. Make sure we clean up
appropriately.
This is sort of ugly, but I don't see a good alternative given the
constraints.
Issue found by asan buildbots running the testcase for D80330.
Differential Revision: https://reviews.llvm.org/D82509
This is a followup on D78403.
I'm unsure about `getAtomicOpAlign` overloads that take `AtomicRMWInst` and `AtomicCmpXchgInst`, shouldn't `getAlign` provide the correct answer already?
Differential Revision: https://reviews.llvm.org/D81369
I noticed that for some benchmarks we spend quite a bit of time
inside AttributeList::hasAttrSomewhere(), mainly when checking
for the "returned" attribute. Most of the time the attribute will
not be present, in which case this function has to walk through
the whole attribute list and check for the attribute at each index.
This patch adds a cache of all "available somewhere" attributes
inside AttributeListImpl. This makes the structure 12 bytes larger,
but I don't think that's problematic, as attribute lists are uniqued.
Compile-time in terms of instructions retired improves by 0.4% on
average, but >1% for sqlite.
Differential Revision: https://reviews.llvm.org/D81867
When calling on-the-fly passes from the legacy pass manager, the modification
status is not reported, which is a problem in case we depend on an acutal
transformation pass, and not only analyse.
Update the Legacy PM API to optionally report the changed status, assert if a
change is detected but this change is lost.
Related to https://reviews.llvm.org/D80916
Differential Revision: https://reviews.llvm.org/D81236
Fixed an issue in DataLayout::getIntPtrType where we were assuming
the input type was always a fixed vector type, which isn't true.
Added a test that exposed the problem to:
Transforms/InstCombine/vector_gep1.ll
Differential Revision: https://reviews.llvm.org/D82294
This is cleaning up comments (mostly in the bitcode handling) about
removing some backward compatibility aspect in the 4.0 release.
Historically, "4.0" was used during the development of the 3.x
versions as "this future major breaking change version". At the time
the major number was used to indicate the compatibility. When we
reached 3.9 we decided to change the numbering, instead of going to
3.10 we went to 4.0 but after changing the meaning of the major
number to not mean anything anymore with respect to bitcode backward
compatibility.
The current policy
(https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility)
indicates only now:
The current LLVM version supports loading any bitcode since version 3.0.
Differential Revision: https://reviews.llvm.org/D82514
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This patch fixes a compiler crash that was hit when trying to simplify
the following code:
getelementptr [2 x i64], [2 x i64]* null, i64 0, <vscale x 2 x i64> zeroinitializer
For the case where we have a null pointer value like above, we just
need to ensure we don't assume the indices are always fixed width.
Differential Revision: https://reviews.llvm.org/D82183
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
This has two advantages: one, it's simpler, and two, it doesn't require
heroic pattern matching with scalable vectors.
Also includes a small fix to DataLayout to allow the scalable vector
testcase to work correctly.
Differential Revision: https://reviews.llvm.org/D82061
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79650
This functionality is very similar to Function compatibility with
AnnotationWriter. This change allows us to use AnnotationWriter with
BasicBlock through BB.print() method.
Reviewed-By: apilipenko
Differntial Revision: https://reviews.llvm.org/D81321
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
Summary:
Assume all usages of this function are explicitly fixed-width operations
and cast to FixedVectorType
Reviewers: efriedma, sdesmalen, c-rhodes, majnemer, dblaikie
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80262
Summary:
Fix invalid usages of getNumElements identified by test case
LLVM.Transforms/InstCombine::vscale_extractelement.ll.
changesLength: Since the length of the llvm::SmallVector shufflemask
is related to the minimum number of elements in a scalable vector, it is
fine to just get the Min field of the ElementCount
isIdentityWithExtract: Since it is not possible to express the mask
needed for this pattern for scalable vectors, we can just bail before
calling getNumElements()
Reviewers: efriedma, sdesmalen, fpetrogalli, gchatelet, yrouban, craig.topper
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81969
Summary:
Previously, GlobalAlias::copyAttributesFrom did not preserve ThreadLocalMode,
causing incorrect IR generation in IR linking flows. This patch pushes the code
responsible for copying this attribute from GlobalVariable::copyAttributesFrom
down to GlobalValue::copyAttributesFrom so that it is shared by GlobalAlias.
Fixes PR46297.
Reviewers: tejohnson, pcc, hans
Reviewed By: tejohnson, hans
Subscribers: hiraditya, ibookstein, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81605
Summary:
Attempts to call getNumElements on scalable vectors identified by test
LLVM.Other::scalable-vectors-core-ir.ll. Since these checks are all
attempting to find if two vectors are the same size, calling
getElementCount will only increase safety.
Reviewers: efriedma, aprantl, reames, kmclaughlin, sdesmalen
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81895
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79650
In preparation for a patch that will enforce new rules for the usage of
the strictfp attribute, this patch introduces auto-upgrade behavior that
will replace the strictfp attribute on callsites with nobuiltin if the
enclosing function declaration doesn't also have the strictfp attribute.
This auto-upgrade isn't being performed on .ll files because that would
prevent us from writing a test for the forthcoming verifier behavior.
Differential Revision: https://reviews.llvm.org/D70096
When checking for an enum function attribute, use hasFnAttribute()
rather than hasAttribute() at FunctionIndex, because it is
significantly faster (and more concise to boot).
Change BasicBlock::removePredecessor to optionally return a vector of
instructions which might be dead. Use this in ConstantFoldTerminator to
delete them if they are dead.
Reapply with a bug fix: don't drop the "!KeepOneInputPHIs" argument when
removePredecessor calls PHINode::removeIncomingValue.
Differential Revision: https://reviews.llvm.org/D80206
Change BasicBlock::removePredecessor to optionally return a vector of
instructions which might be dead. Use this in ConstantFoldTerminator to
delete them if they are dead.
Differential Revision: https://reviews.llvm.org/D80206
Summary:
This patch adds optional field into function summary,
implements asm and bitcode serialization. YAML
serialization is omitted and can be added later if
needed.
This patch includes this information into summary only
if module contains at least one sanitize_memtag function.
In a near future MTE is the user of the analysis.
Later if needed we can provede more direct control
on when information is included into summary.
Reviewers: eugenis
Subscribers: hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80908
Commit d77ae1552f ("[DebugInfo] Support to emit debugInfo
for extern variables") added support to emit debuginfo
for extern variables. Currently, only BPF target enables to
emit debuginfo for extern variables.
But if the extern variable has "void" type, the compilation will
fail.
-bash-4.4$ cat t.c
extern void bla;
void *test() {
void *x = &bla;
return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
missing global variable type
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
isLocal: false, isDefinition: false)
...
fatal error: error in backend: Broken module found, compilation aborted!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace,
preprocessed source, and associated run script.
Stack dump:
...
The IR requires a DIGlobalVariable must have a valid type and the
"void" type does not generate any type, hence the above fatal error.
Note that if the extern variable is defined as "const void", the
compilation will succeed.
-bash-4.4$ cat t.c
extern const void bla;
const void *test() {
const void *x = &bla;
return x;
}
-bash-4.4$ clang -target bpf -g -O2 -S t.c
-bash-4.4$ cat t.ll
...
!1 = distinct !DIGlobalVariable(name: "bla", scope: !2, file: !3, line: 1,
type: !6, isLocal: false, isDefinition: false)
!6 = !DIDerivedType(tag: DW_TAG_const_type, baseType: null)
...
Since currently, "const void extern_var" is supported by the
debug info, it is natural that "void extern_var" should also
be supported. This patch disabled assertion of "void extern_var"
in IR verifier and add proper guarding when emiting potential
null debug info type to dwarf types.
Differential Revision: https://reviews.llvm.org/D81131
Replace getNumElements() with getElementCount() when asserting that
two types have the same element counts.
Differential Revision: https://reviews.llvm.org/D81371
Now that we have an operand based form for the GC arguments to a statepoint intrinsic, update RS4GC to use it and update tests to reflect. This is pretty straight forward. I nearly landed without review, but figured a second set of eyes didn't hurt.
Differential Revision: https://reviews.llvm.org/D81121
This patch fixes VPIntrinsic::canIgnoreVectorLength when used on a
VPIntrinsic with scalable vector types. Also includes new unittest cases
for the '<vscale x 1 x whatever>' and '%evl == vscale' corner cases.
Allow InvokeInst to have the second optional prof branch weight for
its unwind branch. InvokeInst is a terminator with two successors.
It might have its unwind branch taken many times. If so
the BranchProbabilityInfo unwind branch heuristic can be inaccurate.
This patch allows a higher accuracy calculated with both branch
weights set.
Changes:
- A new section about InvokeInst is added to
the BranchWeightMetadata page. It states the old information that
missed in the doc and adds new about the second branch weight.
- Verifier is changed to allow either 1 or 2 branch weights
for InvokeInst.
- A new test is written for BranchProbabilityInfo to demonstrate
the main improvement of the simple fix in calcMetadataWeights().
- Several new testcases are created for Inliner. Those check that
both weights are accounted for invoke instruction weight
calculation.
- PGOUseFunc::setBranchWeights() is fixed to be applicable to
InvokeInst.
Reviewers: davidxl, reames, xur, yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80618
Remove the function Instruction::setProfWeight() and make
use of Instruction::copyMetadata(.., {LLVMContext::MD_prof}).
This is correct for all use cases of setProfWeight() as it
is applied to CallBase instructions only.
This change results in prof metadata copied intact even if
the source has "VP". The old pair of calls
extractProfTotalWeight() + setProfWeight() resulted in
setting branch_weights if the source had "VP" data.
Reviewers: yamauchi, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80987
In an earlier patch I removed the need for
IITDescriptor::ScalableVecArgument, which involved changing
DecodeIITType to pull out the last IIT_Info from the list. However,
it turns out this is unsafe and causes ubsan failures. I've tried to
fix this a different way by simply passing the last IIT_Info as an
additional argument to DecodeIITType.
Differential Revision: https://reviews.llvm.org/D81057
We introduced the GCStatepointInst class and have migrated almost all users of Statepoint/ImmutableStatepoint to the new API. Given downstream consumers have had a week to migrate, remove code which is now dead.
As noted in a comment on D80937, all of these are specified as unsigned values, but the verifier code was using signed. Given the practical values involved, the different in range didn't matter, but we might as well clean it up.
Currently, gc.relocates are defined in terms of indices into the statepoint's operand list. Given the gc args are at the end of a variable length list of operands, this makes interpreting their indices by hand a tad challenging. We can simplify the statepoint sequence and improve readability quite a bit by pulling these new operands into their own named operand bundle.
This patch defines a new operand bundle tag "gc-live". The semantics of the bundle are the same as the existing gc arguments of a statepoint. This patch simply introduces the definition and codegen for the bundle, future patches will migrate RS4GC to emitting the new form.
Interestingly, with this done and the recent migration to using deopt and gc-transition bundles, we really don't have much left in the statepoint itself. It really looks like the existing ID and flags fields are redundant; we have (existing!) attributes for all of them. I think we'll be able to reduce the gc.statepoint signature to simply a wrapped call (e.g. actual target and actual arguments).
Differential Revision: https://reviews.llvm.org/D80937
Summary:
The working set size heuristics (ProfileSummaryInfo::hasHugeWorkingSetSize)
under the partial sample PGO may not be accurate because the profile is partial
and the number of hot profile counters in the ProfileSummary may not reflect the
actual working set size of the program being compiled.
To improve this, the (approximated) ratio of the the number of profile counters
of the program being compiled to the number of profile counters in the partial
sample profile is computed (which is called the partial profile ratio) and the
working set size of the profile is scaled by this ratio to reflect the working
set size of the program being compiled and used for the working set size
heuristics.
The partial profile ratio is approximated based on the number of the basic
blocks in the program and the NumCounts field in the ProfileSummary and computed
through the thin LTO indexing. This means that there is the limitation that the
scaled working set size is available to the thin LTO post link passes only.
Reviewers: davidxl
Subscribers: mgorny, eraman, hiraditya, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79831
Replace calls to getNumElements() with getElementCount() in order
to avoid warnings for scalable vectors. The warnings were discovered
by this existing test:
test/CodeGen/AArch64/sve-gep.ll
Differential revision: https://reviews.llvm.org/D80782
This is split off from D79100 and:
- adds a intrinsic description/definition for @llvm.get.active.lane.mask(), and
- describe its semantics in LangRef.
As described (in more detail) in its LangRef section, it is semantically
equivalent to an icmp with the vector induction variable and the back-edge
taken count, and generates a mask of active/inactive vector lanes.
It will have several use cases. First, it will be used by the
ExpandVectorPredication pass for the VP intrinsics, to expand VP intrinsics for
scalable vectors on targets that do not support the `%evl` parameter, see
D78203.
Also, this is part of, and essential for our ARM MVE tail-predication story:
- this intrinsic will be emitted by the LoopVectorizer in D79100, when
the scalar epilogue is tail-folded into the vector body. This new intrinsic
will generate the predicate for the masked loads/stores, and it takes the
back-edge taken count as an argument. The back-edge taken count represents the
number of elements processed by the loop, which we need to setup MVE
tail-predication.
- Emitting the intrinsic is controlled by a new TTI hook, see D80597.
- We pick up this new intrinsic in an ARM MVETailPredication backend pass, see
D79175, and convert it to a MVE target specific intrinsic/instruction to
create a tail-predicated loop.
Differential Revision: https://reviews.llvm.org/D80596
I'd apparently only grepped in the lib directories and missed a few used in the Statepoint header itself. Beyond simple mechanical cleanup, changed the type of one routine to reflect the fact it also returns a statepoint.
00940fb854 changed this code to
construct a set for the B metadata. However, it still performs a
linear is_contained query, rather than making use of the set
structure.
Summary:
Count the per-module number of basic blocks when the module summary is computed
and sum them up during Thin LTO indexing.
This is used to estimate the working set size under the partial sample PGO.
This is split off of D79831.
Reviewers: davidxl, espindola
Subscribers: emaste, inglorion, hiraditya, MaskRay, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80403
Continues from D80598.
The key point of the change is to default to using operand bundles instead of the inline length prefix argument lists for statepoint nodes. An important subtlety to note is that the presence of a bundle has semantic meaning, even if it is empty. As such, we need to make a somewhat deeper change to the interface than is first obvious.
Existing code treats statepoint deopt arguments and the deopt bundle operands differently during inlining. The former is ignored (resulting in caller state being dropped), the later is merged.
We can't preserve the old behaviour for calls with deopt fed to RS4GC and then inlining, but we can avoid the no-deopt case changing. At least in internal testing, that seem to be the important one. (I'd argue the "stop merging after RS4GC" behaviour for the former was always "unexpected", but that the behaviour for non-deopt calls actually make sense.)
Differential Revision: https://reviews.llvm.org/D80674
This patch upgrades DISubrange to support fortran requirements.
Summary:
Below are the updates/addition of fields.
lowerBound - Now accepts signed integer or DIVariable or DIExpression,
earlier it accepted only signed integer.
upperBound - This field is now added and accepts signed interger or
DIVariable or DIExpression.
stride - This field is now added and accepts signed interger or
DIVariable or DIExpression.
This is required to describe bounds of array which are known at runtime.
Testing:
unit test cases added (hand-written)
check clang
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D80197
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
Back when we had CallSite, we implemented the current Statepoint/ImmutableStatepoint structure in analogous manner. Now that CallSite has been removed, the structure used for statepoints looks decidely out of place. gc.statepoint is one of the small handful of intrinsics which are invokable. Because of this, it can't subclass IntrinsicInst as is idiomatic.
This change simply introduces the GCStatepointInst class, restructures the existing Statepoint/ImmutableStatepoint types to wrap it. I will be landing a series of changes to sink functionality into GCStatepointInst and updating callers to be more idiomatic.
- This allow us to specify the (minimal) alignment on an intrinsic's
arguments and, more importantly, the return value.
Differential Revision: https://reviews.llvm.org/D80422
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.
This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.
Differential Revision: https://reviews.llvm.org/D8059
Summary:
preallocated and musttail can work together, but we don't want to call
@llvm.call.preallocated.setup() to modify the stack in musttail calls.
So we shouldn't have the "preallocated" operand bundle when a
preallocated call is musttail.
Also disallow use of preallocated on calls without preallocated.
Codegen not yet implemented.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80581
-fno-semantic-interposition is currently the CC1 default. (The opposite
disables some interprocedural optimizations.) However, it does not infer
dso_local: on most targets accesses to ExternalLinkage functions/variables
defined in the current module still need PLT/GOT.
This patch makes explicit -fno-semantic-interposition infer dso_local,
so that PLT/GOT can be eliminated if targets implement local aliases
for AsmPrinter::getSymbolPreferLocal (currently only x86).
Currently we check whether the module flag "SemanticInterposition" is 0.
If yes, infer dso_local. In the future, we can infer dso_local unless
"SemanticInterposition" is 1: frontends other than clang will also
benefit from the optimization if they don't bother setting the flag.
(There will be risks if they do want ELF interposition: they need to set
"SemanticInterposition" to 1.)
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
Summary:
Replace any extant metadata uses of a dying instruction with undef to
preserve debug info accuracy. Some alternatives include:
- Treat Instruction like any other Value, and point its extant metadata
uses to an empty ValueAsMetadata node. This makes extant dbg.value uses
trivially dead (i.e. fair game for deletion in many passes), leading to
stale dbg.values being in effect for too long.
- Call salvageDebugInfoOrMarkUndef. Not needed to make instruction removal
correct. OTOH results in wasted work in some common cases (e.g. when all
instructions in a BasicBlock are deleted).
This came up while discussing some basic cases in
https://reviews.llvm.org/D80052.
Reviewers: jmorse, TWeaver, aprantl, dexonsmith, jdoerfert
Subscribers: jholewinski, qcolombet, hiraditya, jfb, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80264
Summary:
Module::setProfileSummary currently calls addModuelFlag. This prevents from
updating the ProfileSummary metadata in the module and results in a second
ProfileSummary added instead of replacing an existing one. I don't think this is
the expected behavior. It prevents updating the ProfileSummary and it does not
make sense to have more than one. To address this, add Module::setModuleFlag and
use it from setProfileSummary.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79902
Summary:
PartialProfileRatio approximately represents the ratio of the number of profile
counters of the program being built to the number of profile counters in the
partial sample profile. It is used to scale the working set size under the
partial sample profile to reflect the size of the program being built and to
improve the working set size heuristics.
This is a split from D79831.
Reviewers: davidxl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79951
I have refactored the code so that we no longer need the
ScalableVecArgument descriptor - the scalable property of vectors is
now encoded using the ElementCount class in IITDescriptor. This means
that when matching intrinsics we know precisely how to match the
arguments and return values.
Differential Revision: https://reviews.llvm.org/D80107
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.
Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer. This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated. I
updated clang's handling of C++ references, and added a release note for
this.
Differential Revision: https://reviews.llvm.org/D80072
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
Summary:
Rename 'i' to 'I'.
Factor out the optional field handling to getOptionalVal().
Split out of D79951.
Reviewers: davidxl
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80230
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.
In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.
This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.
The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.
The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.
Force any function containing a preallocated call to use the frame
pointer.
Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.
Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).
Aside from the tests added here, I checked that this codegen produces
correct code for something like
```
struct A {
A();
A(A&&);
~A();
};
void bar() {
foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```
by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77689
This is the second attempt at landing this patch, after fixing the
KeepOneInputPHIs behaviour to also keep zero input PHIs.
Differential Revision: https://reviews.llvm.org/D80141
r119493 protected against PHINode::hasConstantValue returning the PHI
node itself, but a later fix in r159687 means that can never happen, so
the workarounds are no longer required.
Summary:
Currently they are not supported together. Supporting them will require
a LangRef change. See discussion in https://reviews.llvm.org/D77689.
Reviewers: rnk, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80132
r2694 fixed a bug where removePredecessor could create IR with a use not
dominated by its def in a self loop. But this could only happen in an
unreachable loop, and since that time the rules have been relaxed so
that defs don't have to dominate uses in unreachable code, so the fix is
unnecessary. The regression test added in r2691 still stands.
Differential Revision: https://reviews.llvm.org/D80128
Along the lines of D77454 and D79968. Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.
Differential Revision: https://reviews.llvm.org/D80044
This is D77454, except for stores. All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.
Differential Revision: https://reviews.llvm.org/D79968
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.
In the future, this attribute may be replaced with data layout
properties.
Differential Revision: https://reviews.llvm.org/D78862
Remove Use::setPrev. It provided no value because it had the same
accessibility as the underlying field Prev, and there was no
corresponding setNext anyway.
Simplify Use::removeFromList.
Summary:
The BFloat IR type is introduced to provide support for, initially, the BFloat16
datatype introduced with the Armv8.6 architecture (optional from Armv8.2
onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE
754 floating point IR type.
This is part of a patch series upstreaming Armv8.6 features. Subsequent patches
will upstream intrinsics support and C-lang support for BFloat.
Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau
Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78190
I have changed the ScalableVecArgument case in matchIntrinsicType
to create a new FixedVectorType. This means that the next case we
hit (Vector) will not assert when calling getNumElements(), since
we know that it's always a FixedVectorType. This is a temporary
measure for now, and it will be fixed properly in another patch
that refactors this code.
The changes are covered by this existing test:
CodeGen/AArch64/sve-intrinsics-fp-converts.ll
In addition, I have added a new test to ensure that we correctly
reject SVE intrinsics when called with fixed length vector types.
Differential Revision: https://reviews.llvm.org/D79416
This patch adds support for DWARF attribute DW_AT_data_location.
Summary:
Dynamic arrays in fortran are described by array descriptor and
data allocation address. Former is mapped to DW_AT_location and
later is mapped to DW_AT_data_location.
Testing:
unit test cases added (hand-written)
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79592
llvm rejects DWARF operator DW_OP_push_object_address.This DWARF
operator is needed for Flang to support allocatable array.
Summary:
Currently llvm rejects DWARF operator DW_OP_push_object_address.
below error is produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151)
warning: ignoring invalid debug info in pushobj.ll
[..]
There are some parts missing in support of this operator, need to
be completed.
Testing
-added a unit testcase
-check-debuginfo
-check-llvm
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79306
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases. So fix the LoadInst API so the alignment is never missing.
To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically. This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.
This is the last in a series of patches, so most of the necessary
changes have already been merged.
Differential Revision: https://reviews.llvm.org/D77454
This patch extends DIModule Debug metadata in LLVM to support
Fortran modules. DIModule is extended to contain File and Line
fields, these fields will be used by Flang FE to create debug
information necessary for representing Fortran modules at IR level.
Furthermore DW_TAG_module is also extended to contain these fields.
If these fields are missing, debuggers like GDB won't be able to
show Fortran modules information correctly.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D79484
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.
Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.
`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.
This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.
Reviewed By: asbirlea, rnk
Differential Revision: https://reviews.llvm.org/D78659
don't span their entire scope.
The previous commit (6d1c40c171) is an older version of the test.
Reviewed By: aprantl, vsk
Differential Revision: https://reviews.llvm.org/D79573
When calculating the natural alignment for scalable vectors it
is acceptable to calculate an allocation size based on the minimum
number of elements in the vector.
This code path is exercised by an existing test:
CodeGen/AArch64/sve-intrinsics-int-arith.ll
Differential Revision: https://reviews.llvm.org/D79475
Summary: Add -detailed-summary support for sample profile dump to match that of instrumentation profile.
Reviewers: wmi, davidxl, hoyFB
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79291
Summary:
Constrain which metadata nodes are allowed to be, or contain,
DILocations. This ensures that logic for updating DILocations in a
Module is complete.
Currently, !llvm.loop metadata is the only odd duck which contains
nested DILocations. This has caused problems in the past: some passes
forgot to visit the nested locations, leading to subtly broken debug
info and late verification failures.
If there's a compelling reason for some future metadata to nest
DILocations, we'll need to introduce a generic API for updating the
locations attached to an Instruction before relaxing this check.
Reviewers: aprantl, dsanders
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79245
Summary:
Remove invalid usage of VectorType::getNumElements in
ShuffleVectorInst::isValidOperands identified by test case
llvm::Analysis/ConstantFolding/vscale-shufflevector.ll. The tested
conditions hold for both fixed width and scalable vectors; use
getElementCount().
Reviewers: efriedma, sdesmalen, c-rhodes, spatel
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79212
Summary:
Have stripNonLineTableDebugInfo() attach updated !llvm.loop metadata to
an instruction (instead of updating and then discarding the metadata).
This fixes "!dbg attachment points at wrong subprogram for function"
errors seen while archiving an iOS app.
It would be nice -- as a follow-up -- to catch this issue earlier,
perhaps by modifying the verifier to constrain where DILocations are
allowed. Any alternative suggestions appreciated.
rdar://61982466
Reviewers: aprantl, dsanders
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79200
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Add llvm.call.preallocated.{setup,arg} instrinsics.
Add "preallocated" operand bundle which takes a token produced by llvm.call.preallocated.setup.
Add "preallocated" parameter attribute, which is like byval but without the copy.
Verifier changes for these IR constructs.
See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74651
Profile and profile summary are usually read only once and then annotated
on IR. The profile summary metadata on IR should include the value of the
newly added partial profile flag, so that compilation phase like thinlto
postlink can get the full set of profile information.
Differential Revision: https://reviews.llvm.org/D78310
This means AttrBuilder will always create a sorted set of attributes and
we can skip the sorting step. Sorting attributes is surprisingly
expensive, and I recently made it worse by making it use array_pod_sort.
Attributes are currently stored as a simple list. Enum attributes
additionally use a bitset to allow quickly determining whether an
attribute is set. String attributes on the other hand require a
full scan of the list. As functions tend to have a lot of string
attributes (at least when clang is used), this is a noticeable
performance issue.
This patch adds an additional name => attribute map to the
AttributeSetNode, which allows querying string attributes quickly.
This results in a 3% reduction in instructions retired on CTMark.
Changes to memory usage seem to be in the noise (attribute sets are
uniqued, and we don't tend to have more than a few dozen or hundred
unique attribute sets, so adding an extra map does not have a
noticeable cost.)
Differential Revision: https://reviews.llvm.org/D78859
The CallSite and ImmutableCallSite were removed in a previous
commit. So rename the file to match the remaining class and
the name of the cpp that implements it.
Summary:
- Whether or not a vector is scalable is a function of its type. Since
all instances of ScalableVectorType will have true for this value and
all instances of FixedVectorType will have false for this value, there
is no need to store it as a class member.
Reviewers: efriedma, fpetrogalli, kmclaughlin
Reviewed By: fpetrogalli
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78601
Summary:
Piggy-back off of TypeSize's STRICT_FIXED_SIZE_VECTORS flag and:
- if it is defined, assert that the vector is not scalable
- if it is not defined, complain if the vector is scalable
Reviewers: efriedma, sdesmalen, c-rhodes
Reviewed By: sdesmalen
Subscribers: hiraditya, mgorny, tschuett, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78576
Summary:
* VectorType::getBitWidth() is just an unsafe version of
getPrimitiveSizeInBits() that assumes all vectors are fixed width.
Reviewers: efriedma, sdesmalen, huntergr, craig.topper
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77833
CallSite will likely be removed soon, but AbstractCallSite serves a different purpose and won't be going away.
This patch switches it to internally store a CallBase* instead of a
CallSite. The only interface changes are the removal of the getCallSite
method and getCallBackUses now takes a CallBase&. These methods had only
a few callers that were easy enough to update without needing a
compatibility shim.
In the future once the other CallSites are gone, the CallSite.h
header should be renamed to AbstractCallSite.h
Differential Revision: https://reviews.llvm.org/D78322
Remove unused BasicBlock forward declaration from Pass.h and Attributes/BasicBlock includes from Pass.cpp
Add BasicBlock forward declaration to UnifyFunctionExitNodes.h which was relying on Pass.h
The current strategy LICM uses when sinking for debuginfo is
that of picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, see, e.g. PR45523.
This patch introduces a generalization of getMergedLocation(),
that operates on a vector of locations instead of two, and try
to merge all them together, and use the new API in LICM.
<rdar://problem/61750950>