Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.
amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
some limit
2) It increases the threshold if there are pointers to private arrays(?)
These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.
This way we can remove the custom amdgpu-inline pass.
This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D94153
Add factory to create streams for logging the reproducer. Allows for more general logging (beyond file) and logging the configuration/module separately (logged in order, configuration before module).
Also enable querying filename of ToolOutputFile.
Differential Revision: https://reviews.llvm.org/D94868
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.
This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.
By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.
The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.
I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94286
Upgrade RISC-V V extension to v1.0-08a0b46.
Indexed load/store have ordered and unordered form.
New whole vector load/store.
Differential Revision: https://reviews.llvm.org/D93614
This adds cost modelling for the inloop vectorization added in
745bf6cf44. Up until now they have been modelled as the original
underlying instruction, usually an add. This happens to works OK for MVE
with instructions that are reducing into the same type as they are
working on. But MVE's instructions can perform the equivalent of an
extended MLA as a single instruction:
%sa = sext <16 x i8> A to <16 x i32>
%sb = sext <16 x i8> B to <16 x i32>
%m = mul <16 x i32> %sa, %sb
%r = vecreduce.add(%m)
->
R = VMLADAV A, B
There are other instructions for performing add reductions of
v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64
(VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV).
The i64 are particularly interesting as there are no native i64 add/mul
instructions, leading to the i64 add and mul naturally getting very
high costs.
Also worth mentioning, under NEON there is the concept of a sdot/udot
instruction which performs a partial reduction from a v16i8 to a v4i32.
They extend and mul/sum the first four elements from the inputs into the
first element of the output, repeating for each of the four output
lanes. They could possibly be represented in the same way as above in
llvm, so long as a vecreduce.add could perform a partial reduction. The
vectorizer would then produce a combination of in and outer loop
reductions to efficiently use the sdot and udot instructions. Although
this patch does not do that yet, it does suggest that separating the
input reduction type from the produced result type is a useful concept
to model. It also shows that a MLA reduction as a single instruction is
fairly common.
This patch attempt to improve the costmodelling of in-loop reductions
by:
- Adding some pattern matching in the loop vectorizer cost model to
match extended reduction patterns that are optionally extended and/or
MLA patterns. This marks the cost of the reduction instruction correctly
and the sext/zext/mul leading up to it as free, which is otherwise
difficult to tell and may get a very high cost. (In the long run this
can hopefully be replaced by vplan producing a single node and costing
it correctly, but that is not yet something that vplan can do).
- getExtendedAddReductionCost is added to query the cost of these
extended reduction patterns.
- Expanded the ARM costs to account for these expanded sizes, which is a
fairly simple change in itself.
- Some minor alterations to allow inloop reduction larger than the highest
vector width and i64 MVE reductions.
- An extra InLoopReductionImmediateChains map was added to the vectorizer
for it to efficiently detect which instructions are reductions in the
cost model.
- The tests have some updates to show what I believe is optimal
vectorization and where we are now.
Put together this can greatly improve performance for reduction loop
under MVE.
Differential Revision: https://reviews.llvm.org/D93476
This fixes the final (I think?) reference invalidation in `SmallVector`
that we need to fix to align with `std::vector`. (There is still some
left in the range insert / append / assign, but the standard calls that
UB for `std::vector` so I think we don't care?)
For POD-like types, reimplement `emplace_back()` in terms of
`push_back()`, taking a copy even for large `T` rather than lose the
realloc optimization in `grow_pod()`.
For other types, split the grow operation in three and construct the new
element in the middle.
- `mallocForGrow()` calculates the new capacity and returns the result
of `safe_malloc()`. We only need a single definition per
`SmallVectorBase` so this is defined in SmallVector.cpp to avoid code
size bloat. Moving this part of non-POD grow to the source file also
allows the logic to be easily shared with `grow_pod`, and
`report_size_overflow()` and `report_at_maximum_capacity()` can move
there too.
- `moveElementsForGrow()` moves elements from the old to the new
allocation.
- `takeAllocationForGrow()` frees the old allocation and saves the
new allocation and capacity .
`SmallVector:assign(size_type, const T&)` also uses the split-grow
operations for non-POD, but it also has a semantic change when not
growing. Previously, assign would start with `clear()`, and so the old
elements were destructed and all elements of the new vector were
copy-constructed (potentially invalidating references). The new
implementation skips destruction and uses copy-assignment for the prefix
of the new vector that fits. The new semantics match what libc++ does
for `std::vector::assign()`.
Note that the following is another possible implementation:
```
void assign(size_type NumElts, ValueParamT Elt) {
std::fill_n(this->begin(), std::min(NumElts, this->size()), Elt);
this->resize(NumElts, Elt);
}
```
The downside of this simpler implementation is that if the vector has to
grow there will be `size()` redundant copy operations.
(I had planned on splitting this patch up into three for committing
(after getting performance numbers / initial review), but I've realized
that if this does for some reason need to be reverted we'll probably
want to revert the whole package...)
Differential Revision: https://reviews.llvm.org/D94739
Make this look more like the DAG handling and move to common code.
I also noticed AArch64 seems to not be properly adding the
physreg:virtreg mapping to the function live ins.
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D94806
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
Besides we also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.
ldtilecfg
/ \
BB1 BB2
/ \
call BB3
/ \
%1=tileload %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.
Differential Revision: https://reviews.llvm.org/D94155
This makes the following improvements.
For `SHT_GNU_versym`:
* yaml2obj: set `sh_link` to index of `.dynsym` section automatically.
For `SHT_GNU_verdef`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version definitions.
For `SHT_GNU_verneed`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version dependencies.
Also, simplifies few test cases.
Differential revision: https://reviews.llvm.org/D94956
This reverts commit d97f776be5.
The original problem was due to build failures in shared lib builds. D95079
moved ImportedFunctionsInliningStatistics under Analysis, unblocking
this.
This is related to D94982. We want to call these APIs from the Analysis
component, so we can't leave them under Transforms.
Differential Revision: https://reviews.llvm.org/D95079
This reverts commit 5b7aef6eb4 and relands
6529d7c5a4.
The ASan error was debugged and determined to be the fault of an invalid
object file input in our test suite, which was fixed by my last change.
LLD's project policy is that it assumes input objects are valid, so I
have added a comment about this assumption to the relocation bounds
check.
Run the ObjCARCContractPass during LTO. The legacy LTO backend (under
LTO/ThinLTOCodeGenerator.cpp) already does this; this diff just adds that
behavior to the new LTO backend. Without that pass, the objc.clang.arc.use
intrinsic will get passed to the instruction selector, which doesn't know how to
handle it.
In order to test both the new and old pass managers, I've also added support for
the `--[no-]lto-legacy-pass-manager` flags.
P.S. Not sure if the ordering of the pass within the pipeline matters...
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94547
When using 2 InlinePass instances in the same CGSCC - one for other
mandatory inlinings, the other for the heuristic-driven ones - the order
in which the ImportedFunctionStats would be output-ed would depend on
the destruction order of the inline passes, which is not deterministic.
This patch moves the ImportedFunctionStats responsibility to the
InlineAdvisor to address this problem.
Differential Revision: https://reviews.llvm.org/D94982
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.
The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.
There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.
This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D94143
The pass analysis uses "sets" implemented using a SmallVector type
to keep track of Used, Preserved, Required and RequiredTransitive
passes. When having nested analyses we could end up with duplicates
in those sets, as there was no checks to see if a pass already
existed in the "set" before pushing to the vectors. This idea with
this patch is to avoid such duplicates by avoiding pushing elements
that already is contained when adding elements to those sets.
To align with the above PMDataManager::collectRequiredAndUsedAnalyses
is changed to skip adding both the Required and RequiredTransitive
passes to its result vectors (since RequiredTransitive always is
a subset of Required we ended up with duplicates when traversing
both sets).
Main goal with this is to avoid spending time verifying the same
analysis mulitple times in PMDataManager::verifyPreservedAnalysis
when iterating over the Preserved "set". It is assumed that removing
duplicates from a "set" shouldn't have any other negative impact
(I have not seen any problems so far). If this ends up causing
problems one could do some uniqueness filtering of the vector being
traversed in verifyPreservedAnalysis instead.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D94416
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.
Differential Revision: https://reviews.llvm.org/D92219
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D91244
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,
when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...
We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.
The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.
Differential Revision: https://reviews.llvm.org/D94229
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:
```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p) ; instcombine converts this to call void @f(nonnull %p)
```
Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.
```
define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore
call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
ret void
}
```
Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90529
separate sections.
For ThinLTO, all the function profiles without context has been annotated to
outline functions if possible in prelink phase. In postlink phase, profile
annotation in postlink phase is only meaningful for function profile with
context. If the profile is large, it is better to split the profile into two
parts, one with context and one without, so the profile reading in postlink
phase only has to read the part with context. To have the profile splitting,
we extend the ExtBinary format to support different section arrangement. It
will be flexible to add other section layout in the future without the need
to create new class inheriting from ExtBinary class.
Differential Revision: https://reviews.llvm.org/D94435
If we are able to compare with 0 instead of 1, we might be able
to fold the setcc into a beqz/bnez.
Often these setccs start life as an xor that gets converted to
a setcc by DAG combiner's rebuildSetcc. I looked into a detecting
(xor X, 1) and converting to (seteq X, 0) based on boolean contents
being 0/1 in rebuildSetcc instead of using computeKnownBits. It was
very perturbing to AMDGPU tests which I didn't look closely at.
It had a few changes on a couple other targets, but didn't seem
to be much if any improvement.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D94730
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93042
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)
This tries to recognize patterns like below (assuming a little-endian target):
```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)
s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```
(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)
To recognize the pattern, this searches from the last G_OR in the expression
tree.
E.g.
```
Reg Reg
\ /
OR_1 Reg
\ /
OR_2
\ Reg
.. /
Root
```
Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.
If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)
To simplify things, this patch
(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
specific location
An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)
At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.
Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.
Differential Revision: https://reviews.llvm.org/D94350
The TableGen emitter for directives has two slots for flangClass information and this was mainly
to be able to keep up with the legacy openmp parser at the time. Now that all clauses are encapsulated in
AccClause or OmpClause, these two strings are not necessary anymore and were the the source of couple
of problem while working with the generic structure checker for OpenMP.
This patch remove the flangClassValue string from DirectiveBase.td and use the string flangClass as the
placeholder for the encapsulated class.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D94821
This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to
Clang. A bit weird, but the best way for things like xnu to detect the new
features it cares about.
This patch updates the llvm module map to reflect changes made in
`24672ddea3c97fd1eca3e905b23c0116d7759ab8` and fixes the module builds
(`-DLLVM_ENABLE_MODULES=On`).
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
This patch computes the cost for vector.reduce<operand> for scalable vectors.
The cost is split into two parts: the legalization cost and the horizontal
reduction.
Differential Revision: https://reviews.llvm.org/D93639
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.
The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.
This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.
I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.
From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.
There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).
There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.
I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.
Reviewed By: sanwou01
Differential Revision: https://reviews.llvm.org/D94232
Element sections will also need flags, so we shouldn't squat the
WASM_SEGMENT namespace.
Depends on D90948.
Differential Revision: https://reviews.llvm.org/D92315
The code here is checking to see if two sets are identical.
OtherBlocksSet should point to OtherL->getBlocksSet() instead.
Differential Revision: https://reviews.llvm.org/D94926
This patch adds the default value of 1 to drop_begin.
In the llvm codebase, 70% of calls to drop_begin have 1 as the second
argument. The interface similar to with std::next should improve
readability.
This patch converts a couple of calls to drop_begin as examples.
Differential Revision: https://reviews.llvm.org/D94858
DefaultAttrIntrinsics was introduced to add very common attributes to a
large set of intrinsics.
Currently the added attributes include:
nofree nosync nounwind willreturn
I think those should hold for most AArch64 target intrinsics, but
there are too many to check manually. This patch makes most AArch64 target
intrinsics DefaultAttrsIntrinsics.
Some notable exceptions I think are exclusive loads and stores as well
as the memory barrier intrinsics, for which nosync does not apply I
think.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D94687
Keys matching the tombstone/empty special values cannot be inserted in a
DenseMap. Under some circumstances, LV tries to add members to an
interleave group that match the special values. Skip adding such
members. This is unlikely to have any impact in practice, because
interleave groups with such indices are very likely to not be
vectorized, due to gaps.
This issue has been surfaced by fuzzing, see
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11638
`ELFDumper.cpp` implements the functionality that allows to get symbol versions.
It is used for dumping versioned symbols.
This helps to implement https://bugs.llvm.org/show_bug.cgi?id=48670 ("make llvm-nm -D print version names"):
we can move out and reuse the code from `ELFDumper.cpp`.
This is what this patch do: it moves the related functionality to `ELFFile<ELFT>`.
Differential revision: https://reviews.llvm.org/D94771
RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.
This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94142
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
add one use check to lookThruCopyLike.
The root node is safe to be deleted if we are sure that every
definition in the copy chain only has one use.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92069
There are no changes relative to the original commit. However, an issue
this exposed in BasicAA assumption tracking has been fixed in the
previous commit.
-----
An alias query currently works out roughly like this:
* Look up location pair in cache.
* Perform BasicAA logic (including cache lookup and insertion...)
* Perform a recursive query using BestAAResults.
* Look up location pair in cache (and thus do not recurse into BasicAA)
* Query all the other AA providers.
* Query all the other AA providers.
This is a lot of unnecessary work, all ultimately caused by the
BestAAResults query at the end of aliasCheck(). The reason we perform
it, is that aliasCheck() is getting called recursively, and we of
course want those recursive queries to also make use of other AA
providers, not just BasicAA. We can solve this by making the recursive
queries directly use BestAAResults (which will check both BasicAA
and other providers), rather than recursing into aliasCheck().
There are some tradeoffs:
* We can no longer pass through the precomputed underlying object
to aliasCheck(). This is not a major concern, because nowadays
getUnderlyingObject() is quite cheap.
* Results from other AA providers are no longer cached inside
BasicAA. The way this worked was already a bit iffy, in that a
result could be cached, but if it was MayAlias, we'd still end
up re-querying other providers anyway. If we want to cache
non-BasicAA results, we should do that in a more principled manner.
In any case, despite those tradeoffs, this works out to be a decent
compile-time improvment. I think it also simplifies the mental model
of how BasicAA works. It took me quite a while to fully understand
how these things interact.
Differential Revision: https://reviews.llvm.org/D90094
D91936 placed the tracking for the assumptions into BasicAA.
However, when recursing over phis, we may use fresh AAQI instances.
In this case AssumptionBasedResults from an inner AAQI can reesult
in a removal of an element from the outer AAQI.
To avoid this, move the tracking into AAQI. This generally makes
more sense, as the NoAlias assumptions themselves are also stored
in AAQI.
The test case only produces an assertion failure with D90094
reapplied. I think the issue exists independently of that change
as well, but I wasn't able to come up with a reproducer.
This patch removes some ancient options as a clean-up before moving
code-gen to use LTOBackend in D94487.
I think it would preferable to remove those ancient options, because
1. There are no corresponding options in LTOBackend based tools,
2. There are no unit tests for them,
3. They are not passed through by Clang,
4. At least for GNVLoadPRE, users could just use GVN's `enable-load-pre`.
Alternatively we could add support for those options to lto::Config &
co, but I think it would be better to remove them, unless they are
actually used in practice.
Reviewed By: steven_wu, tejohnson
Differential Revision: https://reviews.llvm.org/D94783
Current code breaks this version of MSVC due to a mismatch between `std::is_trivially_copyable` and `llvm::is_trivially_copyable` for `std::pair` instantiations. Hence I was attempting to use `std::is_trivially_copyable` to set `llvm::is_trivially_copyable<T>::value`.
I spent some time root causing an `llvm::Optional` build error on MSVC 16.8.3 related to the change described above:
```
62>C:\src\ocg_llvm\llvm-project\llvm\include\llvm/ADT/BreadthFirstIterator.h(96,12): error C2280: 'llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>>::operator =(const llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &)': attempting to reference a deleted function (compiling source file C:\src\ocg_llvm\llvm-project\llvm\unittests\ADT\BreadthFirstIteratorTest.cpp)
...
```
The "trivial" specialization of `optional_detail::OptionalStorage` assumes that the value type is trivially copy constructible and trivially copy assignable. The specialization is invoked based on a check of `is_trivially_copyable` alone, which does not imply both `is_trivially_copy_assignable` and `is_trivially_copy_constructible` are true.
[[ https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable | According to the spec ]], a deleted assignment operator does not make `is_trivially_copyable` false. So I think all these properties need to be checked explicitly in order to specialize `OptionalStorage` to the "trivial" version:
```
/// Storage for any type.
template <typename T, bool = std::is_trivially_copy_constructible<T>::value
&& std::is_trivially_copy_assignable<T>::value>
class OptionalStorage {
```
Above fixed my build break in MSVC, but I think we need to explicitly check `is_trivially_copy_constructible` too since it might be possible the copy constructor is deleted. Also would be ideal to move over to `std::is_trivially_copyable` instead of the `llvm` namespace verson.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93510
The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias
scope is declared. When the intrinsic is duplicated, a decision must
also be made about the scope: depending on the reason of the duplication,
the scope might need to be duplicated as well.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D93039
Expanding from D94808 - we ensure the same InlineAdvisor is used by both
InlinerPass instances. The notion of mandatory inlining is moved into
the core InlineAdvisor: advisors anyway have to handle that case, so
this change also factors out that a bit better.
Differential Revision: https://reviews.llvm.org/D94825
Unary minus operator applied to unsigned type, result still unsigned.
Use `~0U` instead of `-1U` and `1 + ~VAL` instead of `-VAL`.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D94417
This reverts commit 33be50daa9,
effectively reapplying:
- 260a856c2a
- 3043e5a5c3
- 49142991a6
... with a fix to skip a call to `SmallVector::isReferenceToStorage()`
when we know the parameter had been taken by value for small, POD-like
`T`. See https://reviews.llvm.org/D93779 for the discussion on the
revert.
At a high-level, these commits fix reference invalidation in
SmallVector's push_back, append, insert (one or N), and resize
operations. For more details, please see the original commit messages.
This commit fixes a bug that crept into
`SmallVectorTemplateCommon::reserveForAndGetAddress()` during the review
process after performance analysis was done. That function is now called
`reserveForParamAndGetAddress()`, clarifying that it only works for
parameter values. It uses that knowledge to bypass
`SmallVector::isReferenceToStorage()` when `TakesParamByValue`. This is
`constexpr` and avoids adding overhead for "small enough", trivially
copyable `T`.
Performance could potentially be tuned further by increasing the
threshold for `TakesParamByValue`, which is currently defined as:
```
bool TakesParamByValue = sizeof(T) <= 2 * sizeof(void *);
```
in the POD-like version of SmallVectorTemplateBase (else, `false`).
Differential Revision: https://reviews.llvm.org/D94800
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
This is not nice, but it's the best transient solution possible,
and is better than just duplicating the whole function.
The problem is, this function is widely used,
and it is not at all obvious that all the users
could be painlessly switched to operate on DomTreeUpdater,
and somehow i don't feel like porting all those users first.
This function is one of last three that not operate on DomTreeUpdater.
Even though not all it's users operate on DomTreeUpdater,
it itself internally operates on DomTreeUpdater,
so it must mean everything is fine with that,
so just do that globally.
This reverts commit a3904cc77f.
It causes the compiler to crash while building Harfbuzz for ARM in
Chromium, reduced reproducer forthcoming:
https://crbug.com/1167305
Add a matcher that checks if the given subpattern has only one non-debug use.
Also improve existing m_OneUse testcase.
Differential Revision: https://reviews.llvm.org/D94705
It can be useful for an ObjectLinkingLayerCreator to allow callee errors to get propagated to the builder. Specifically, this is the case when the ObjectLayer uses the EHFrameRegistrationPlugin, because it requires a TPCEHFrameRegistrar and instantiation for it may fail (e.g. if the required registration symbols are missing in the target process).
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D94690
All other layers in LLJIT are stored as unique_ptr's already. At this point, it is not strictly necessary for ObjTransformLayer, but it makes a follow-up change more straightforward.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D94689
New dwarf operator DW_OP_LLVM_implicit_pointer is introduced (present only in LLVM IR)
This operator is required as it is different than DWARF operator
DW_OP_implicit_pointer in representation and specification (number
and types of operands) and later can not be used as multiple level.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D84113
This reverts commit 260a856c2a.
This reverts commit 3043e5a5c3.
This reverts commit 49142991a6.
This change had a larger than anticipated compile-time impact,
possibly because the small value optimization is not working as
intended. See D93779.
It turns out we need to handle `LangOptions` separately from the rest of the options. `LangOptions` used to be conditionally parsed only when `!(DashX.getFormat() == InputKind::Precompiled || DashX.getLanguage() == Language::LLVM_IR)` and we need to restore this order (for more info, see D94682).
We could do this similarly to how `DiagnosticOptions` are handled: via a counterpart to the `IsDiag` mix-in (e.g. `IsLang`). These mix-ins would prefix the option key path with the appropriate `CompilerInvocation::XxxOpts` member. However, this solution would be problematic, as we'd now have two kinds of options (`Lang` and `Diag`) with seemingly incomplete key paths in the same file. To understand what `CompilerInvocation` member an option affects, one would need to read the whole option definition and notice the `IsDiag` or `IsLang` class.
Instead, this patch introduces more robust way to handle different kinds of options separately: via the `KeyPathAndMacroPrefix` class. We have one specialization of that class per `CompilerInvocation` member (e.g. `LangOpts`, `DiagnosticOpts`, etc.). Now, instead of specifying a key path with `"LangOpts->UndefPrefixes"`, we use `LangOpts<"UndefPrefixes">`. This keeps the readability intact (you don't have to look for the `IsLang` mix-in, the key path is complete on its own) and allows us to specify a custom macro prefix within `LangOpts`.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D94676
The number of hardware threads available to a ThreadPool can be limited if setting an affinity mask.
For example:
> start /B /AFFINITY 0xF lld-link.exe ...
Would let LLD only use 4 hyper-threads.
Previously, there was an outstanding issue on Windows Server 2019 on dual-CPU machines, which was preventing from using both CPU sockets. In normal conditions, when no affinity mask was set, ProcessorGroup::AllThreads was different from ProcessorGroup::UsableThreads. The previous code in llvm/lib/Support/Windows/Threading.inc L201 was improperly assuming those two values to be equal, and consequently was limiting the execution to only one CPU socket.
Differential Revision: https://reviews.llvm.org/D92419
An alias query currently works out roughly like this:
* Look up location pair in cache.
* Perform BasicAA logic (including cache lookup and insertion...)
* Perform a recursive query using BestAAResults.
* Look up location pair in cache (and thus do not recurse into BasicAA)
* Query all the other AA providers.
* Query all the other AA providers.
This is a lot of unnecessary work, all ultimately caused by the
BestAAResults query at the end of aliasCheck(). The reason we perform
it, is that aliasCheck() is getting called recursively, and we of
course want those recursive queries to also make use of other AA
providers, not just BasicAA. We can solve this by making the recursive
queries directly use BestAAResults (which will check both BasicAA
and other providers), rather than recursing into aliasCheck().
There are some tradeoffs:
* We can no longer pass through the precomputed underlying object
to aliasCheck(). This is not a major concern, because nowadays
getUnderlyingObject() is quite cheap.
* Results from other AA providers are no longer cached inside
BasicAA. The way this worked was already a bit iffy, in that a
result could be cached, but if it was MayAlias, we'd still end
up re-querying other providers anyway. If we want to cache
non-BasicAA results, we should do that in a more principled manner.
In any case, despite those tradeoffs, this works out to be a decent
compile-time improvment. I think it also simplifies the mental model
of how BasicAA works. It took me quite a while to fully understand
how these things interact.
Differential Revision: https://reviews.llvm.org/D90094
This patch rename the tablegen generated file ACC.cpp.inc to ACC.inc in order
to match what was done in D92955. This file is included in header file as well as .cpp
file so it make more sense.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D93485
This commit adds table symbol support in a partial way, while still
including some special cases for the __indirect_function_table symbol.
No change in tests.
Differential Revision: https://reviews.llvm.org/D94075
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
For small enough, trivially copyable `T`, take the parameter by-value in
`SmallVector::resize`. Otherwise, when growing, update the arugment
appropriately.
Differential Revision: https://reviews.llvm.org/D93781
For small enough, trivially copyable `T`, take the parameter by-value in
`SmallVector::append` and `SmallVector::insert`. Otherwise, when
growing, update the arugment appropriately.
Differential Revision: https://reviews.llvm.org/D93780
This reverts commit 56d1ffb927, reapplying
9abac60309, removing insert_one_maybe_copy
and using a helper called forward_value_param instead. This avoids use
of `std::is_same` (or any SFINAE), so I'm hoping it's more portable and
MSVC will be happier.
Original commit message follows:
For small enough, trivially copyable `T`, take the argument by value in
`SmallVector::push_back` and copy it when forwarding to
`SmallVector::insert_one_impl`. Otherwise, when growing, update the
argument appropriately.
Differential Revision: https://reviews.llvm.org/D93779
For small enough, trivially copyable `T`, take the argument by value in
`SmallVector::push_back` and copy it when forwarding to
`SmallVector::insert_one_impl`. Otherwise, when growing, update the
argument appropriately.
Differential Revision: https://reviews.llvm.org/D93779
The number of hardware threads available to a ThreadPool can be limited if setting an affinity mask.
For example:
> start /B /AFFINITY 0xF lld-link.exe ...
Would let LLD only use 4 hyper-threads.
Previously, there was an outstanding issue on Windows Server 2019 on dual-CPU machines, which was preventing from using both CPU sockets. In normal conditions, when no affinity mask was set, ProcessorGroup::AllThreads was different from ProcessorGroup::UsableThreads. The previous code in llvm/lib/Support/Windows/Threading.inc L201 was improperly assuming those two values to be equal, and consequently was limiting the execution to only one CPU socket.
Differential Revision: https://reviews.llvm.org/D92419
to Pass.h.
In some compiler passes like SampleProfileLoaderPass, we want to know which
LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum
class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and
it also cannot represent phase in full LTO. The patch extends it to include
full LTO phases and move it from PassBuilder.h to Pass.h, then it is much
easier for PassBuilder to communiate with each pass about current LTO phase.
Differential Revision: https://reviews.llvm.org/D94613
Current code breaks this version of MSVC due to a mismatch between `std::is_trivially_copyable` and `llvm::is_trivially_copyable` for `std::pair` instantiations. Hence I was attempting to use `std::is_trivially_copyable` to set `llvm::is_trivially_copyable<T>::value`.
I spent some time root causing an `llvm::Optional` build error on MSVC 16.8.3 related to the change described above:
```
62>C:\src\ocg_llvm\llvm-project\llvm\include\llvm/ADT/BreadthFirstIterator.h(96,12): error C2280: 'llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>>::operator =(const llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &)': attempting to reference a deleted function (compiling source file C:\src\ocg_llvm\llvm-project\llvm\unittests\ADT\BreadthFirstIteratorTest.cpp)
...
```
The "trivial" specialization of `optional_detail::OptionalStorage` assumes that the value type is trivially copy constructible and trivially copy assignable. The specialization is invoked based on a check of `is_trivially_copyable` alone, which does not imply both `is_trivially_copy_assignable` and `is_trivially_copy_constructible` are true.
[[ https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable | According to the spec ]], a deleted assignment operator does not make `is_trivially_copyable` false. So I think all these properties need to be checked explicitly in order to specialize `OptionalStorage` to the "trivial" version:
```
/// Storage for any type.
template <typename T, bool = std::is_trivially_copy_constructible<T>::value
&& std::is_trivially_copy_assignable<T>::value>
class OptionalStorage {
```
Above fixed my build break in MSVC, but I think we need to explicitly check `is_trivially_copy_constructible` too since it might be possible the copy constructor is deleted. Also would be ideal to move over to `std::is_trivially_copyable` instead of the `llvm` namespace verson.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93510
This fixes double printing of insertion debug messages in the
legalizer.
Try to cleanup usage of observers. Currently the use of observers is
pretty hard to follow and it's not clear what is responsible for
them. Observers are referenced in 3 places:
1. In the MachineFunction
2. In the MachineIRBuilder
3. In the LegalizerHelper
The observers in the MachineFunction and MachineIRBuilder are both
called only on insertions, and are redundant with each other. The
source of the double printing was the same observer was added to both
the MachineFunction, and the MachineIRBuilder. One of these references
needs to be removed. Arguably observers in general should be fully
removed from one or the other, but it may be useful to have a local
observer in the MachineIRBuilder that is not added to the function's
observers. Alternatively, the wrapper observer could manage a local
observer in one place.
The LegalizerHelper only ever calls the observer on changing/changed
instructions, and never insertions. Logically these are two different
types of observers, for changes and for insertions.
Additionally, some places used the GISelObserverWrapper when they only
needed a single observer they could use directly.
Setting the observer in the LegalizerHelper constructor is not
flexible enough if the LegalizerHelper is constructed anywhere outside
the one used by the legalizer. AMDGPU calls the LegalizerHelper in
RegBankSelect, and needs to use a local observer to apply the regbank
to newly created instructions. Currently it accomplishes this by
constructing a local MachineIRBuilder. I'm trying to move the
MachineIRBuilder to be owned/maintained by the RegBankSelect pass
itself, but the locally constructed LegalizerHelper would reset the
observer.
Mips also has a special case use of the LegalizationArtifactCombiner
in applyMappingImpl; I think we do need to run the artifact combiner
during RegBankSelect, but in a more consistent way outside of
applyMappingImpl.
For some reason some builds dont like the arrow operator access. using the deref then access should fix the issue.
/home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include/llvm/ADT/iterator.h:171:34: error: taking the address of a temporary object of type 'llvm::StringRef' [-Waddress-of-temporary]
PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include/llvm/ADT/StringExtras.h:387:13: note: in instantiation of member function 'llvm::iterator_facade_base<llvm::mapped_iterator<mlir::tblgen::TypeParameter *, (lambda at /home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/mlir/tools/mlir-tblgen/TypeDefGen.cpp:414:19), llvm::StringRef>, std::random_access_iterator_tag, llvm::StringRef, long, llvm::StringRef *, llvm::StringRef &>::operator->' requested here
Len += I->size();
This reuses the code from yaml2obj (moves it to ELFYAML.h).
With it we can set the `sh_entsize` in a single place in `obj2yaml`.
Note that it also fixes a bug of `yaml2obj`: we do not
set the `sh_entsize` field for the `SHT_ARM_EXIDX` section properly.
Differential revision: https://reviews.llvm.org/D93858
Currently we don't support multiple SHT_SYMTAB_SHNDX sections
and the DT_SYMTAB_SHNDX tag currently.
This patch implements it and fixes the
https://bugs.llvm.org/show_bug.cgi?id=43991.
I had to introduce the `struct DataRegion` to ELF.h,
it is used to represent a region that might have no known size.
It is needed, because we don't know the size of the extended
section indices table when it is located via DT_SYMTAB_SHNDX.
In this case we still want to validate that we don't read
past the end of the file.
Differential revision: https://reviews.llvm.org/D92923
Currently dsymutil will silently fail when processing binaries with
Dwarf 5 debug info. This patch adds rudimentary support for Dwarf 5 in
dsymutil.
- Recognize relocations in the debug_addr section.
- Recognize (a subset of) Dwarf 5 form values.
- Emits valid Dwarf 5 compile unit header chains.
To simplify things (and avoid having to emit indexed sections) I decided
to emit the relocated addresses directly in the debug info section.
- DW_FORM_strx gets relocated and rewritten to DW_FORM_strp
- DW_FORM_addrx gets relocated and rewritten to DW_FORM_addr
Obviously there's a lot of work left, but this should be a step in the
right direction.
rdar://62345491
Differential revision: https://reviews.llvm.org/D94323
Also old mir tests are updated to meet last changes in STATEPOINT format.
Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D94482
This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.
This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.
Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)
I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.
Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.
Differential Revision: https://reviews.llvm.org/D94267
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.
To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).
Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D92015
Most uses of this class just use the default MallocAllocator.
As this contains no fields, we can use the empty base optimisation for BumpPtrAllocatorImpl and save 8 bytes of padding for most use cases.
This prevents using a class that is marked as `final` as the `AllocatorT` template argument.
In one must use an allocator that has been marked as `final`, the simplest way around this is a proxy class.
The class should have all the methods that `AllocaterBase` expects and should forward the calls to your own allocator instance.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D94439
This change modifies the source location formatting from:
LineNumber.Discriminator
to:
LineNumber:ColumnNumber.Discriminator
The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff.
The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy.
Testing:
ninja check-llvm
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D94333
Similar to D94125, derive `willreturn` for functions that are `readonly` and
`mustprogress` in FunctionAttrs.
To quote the reasoning from D94125:
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.
Reviewed By: jdoerfert, nikic
Differential Revision: https://reviews.llvm.org/D94502
Use TableGen and information in ACC.td for the Default enum in the OpenACC dialect.
This patch generalize what was done for OpenMP for directives.
Follow up patch after D93576
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D93710
The patch adds the required methods to FixedPointBuilder
for converting between fixed-point and floating point,
and uses them from Clang.
This depends on D54749.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D86632
C++14 attributes are superior because they can be applied to functions with inline definition and the syntax is cleaner.
I intend to convert all uses and then remove the macro.
One issue that might hold back switching uses to C++14 attributes is that
clang-format does not put long attributes on separate lines and formatted code will look like:
```
template <typename T>
[[deprecated("blah blah")]] void
foooooooooooooooooooooooooooo() {
...
}
```
Putting long attributes on a separate line would be prettier.
See https://stackoverflow.com/questions/45740466/clang-format-setting-to-control-c-attributes
AttributeMacros probably won't help because it can't match the custom message.
https://clang.llvm.org/docs/ClangFormatStyleOptions.html
Reviewed By: rriddle, MaskRay
Differential Revision: https://reviews.llvm.org/D94219
Remove the InsertionPoint argument from SlotIndexes::insertMBBInMaps
because it was confusing: what does it mean to insert a new block
between two instructions, in the middle of an existing block?
Instead, support the case that MachineBasicBlock::splitAt really needs,
where the new block contains some instructions that are already in the
maps because they have been moved there from the tail of the previous
block.
In all other use cases the new block is empty.
Based on work by Carl Ritson!
Differential Revision: https://reviews.llvm.org/D94311
In ST mode, flat scratch instructions have neither an sgpr nor a vgpr
for the address. This lead to an assertion when inserting hard clauses.
Differential Revision: https://reviews.llvm.org/D94406
Added a utility function in Value class to print block name and use
block labels for unnamed blocks.
Changed LICM to call this function in its debug output.
Patch by Xiaoqing Wu <xiaoqing_wu@apple.com>
Differential Revision: https://reviews.llvm.org/D93577
Passes in the new PostAllocationPasses list will run immediately after memory
allocation and address assignment for defined symbols, and before
JITLinkContext::notifyResolved is called. These passes can set up state
associated with the addresses of defined symbols before any query for these
addresses completes.
Define the `vfclass` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D94356
The standard requires comparisons of pointers to unrelated storage to
use `std::less`. Split out some helpers that do that and update all the
code that was comparing using `<` and friends (mostly assertions).
Differential Revision: https://reviews.llvm.org/D93777
Functions that are renamed under -funique-internal-linkage-names have their debug linkage name updated as well.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93747
Before this patch there was generic mapping from vector_extract
to G_EXTRACT_VECTOR_ELT added in SelectionDAGCompat.td. That
mapping is now replaced by a mapping from extractelt instead.
The reasoning is that vector_extract is marked as deprecated,
so it is assumed that a majority of targets will use extractelt
and not vector_extract (and that the long term solution for all
targets would be to use extractelt).
Targets like AArch64 that still use vector_extract can add an
additional mapping from the deprecated vector_extract as target
specific tablegen definitions. Such a mapping is added for AArch64
in this patch to avoid breaking tests.
When adding the extractelt => G_EXTRACT_VECTOR_ELT mapping we
triggered some new code paths in GlobalISelEmitter, ending up in
an assert when trying to import a pattern containing EXTRACT_SUBREG
for ARM. Therefore this patch also adds a "failedImport" warning
for that situation (instead of hitting the assert).
Differential Revision: https://reviews.llvm.org/D93416
Summary:
Introduce a new mode of operation for -print-changed that only reports
after a pass changes the IR with all of the other messages suppressed (ie,
no initial IR and no messages about ignored, filtered or non-modifying
passes).
The option processing for -print-changed is changed to take an optional
string indicating options for print-changed. Initially, the only option
supported is quiet (as described above). This new quiet mode is specified
with -print-changed=quiet while -print-changed will continue to function
in the same way. It is intended that there will be more options in the
future.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D92589
Now that we flush the local value map for every instruction, we don't
need any extra flushes for specific cases. Also, LastFlushPoint is
not used for anything. Follow-ups to #c161665 (D91734).
This reapplies #3fd39d3.
Differential Revision: https://reviews.llvm.org/D92338
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).
https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.
There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:
Local values may also be used by no-op casts, which adds the
register to the RegFixups table. Without reversing the RegFixups
map direction, we don't have enough information to sink these
instructions.
This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.
In addition, constants materialized due to PHI instructions are
not assigned a debug location immediately; instead, when the
local value map is flushed, if the first local value instruction
has no debug location, it is given the same location as the
first non-local-value-map instruction. This prevents PHIs
from introducing unattributed instructions, which would either
be implicitly attributed to the location for the preceding IR
instruction, or given line 0 if they are at the beginning of
a machine basic block. Neither of those consequences is good
for debugging.
This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.
(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.
This reapplies commits cf1c774d and dc35368c, and adds the
modification to PHI handling, which should avoid problems
with debugging under gdb.
Differential Revision: https://reviews.llvm.org/D91734
The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90909
This reverts commit 8e3e148c
This commit fixes two issues with the original patch:
* The sanitizer build bot reported an uninitialized value. This was caused by normalizeStringIntegral not returning None on failure.
* Some build bots complained about inaccessible keypaths. To mitigate that, "this->" was added back to the keypath to restore the previous behavior.
PreFixupPasses better reflects when these passes will run.
A future patch will (re)introduce a PostAllocationPasses list that will run
after allocation, but before JITLinkContext::notifyResolved is called to notify
the rest of the JIT about the resolved symbol addresses.