The current implementation for call sites is pretty convoluted
when you take the underlying implementation of the used APIs
into account. We will query the call site attributes, and then
fall back to the function attributes while taking into account
operand bundles. However, getModRefBehavior() already has it's
own (more accurate) logic for combining call-site FMRB with
function FMRB.
Clean this up by extracting a function that only fetches FMRB
from attributes, which can be directly used in getModRefBehavior()
for functions, and needs to be combined with an operand-bundle
respecting fallback in the call site case.
One caveat (that makes this non-NFC) is that CallBase function
attribute lookups allow using attributes from functions with
mismatching signature. To ensure we don't regress quality, do
the same for the function FMRB fallback.
This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm.
It also implements the InstCombine for vector.extract that used to be in svget.
Depends on: D131547
Differential Revision: https://reviews.llvm.org/D131548
The patch simplifies some of the patterns as below
1. (ZExt(L1) << shift1) | (ZExt(L2) << shift2) -> ZExt(L3) << shift1
2. (ZExt(L1) << shift1) | ZExt(L2) -> ZExt(L3)
The pattern is indicative of the fact that the loads are being merged to a wider load and the only use of this pattern is with a wider load. In this case for a non-atomic/non-volatile loads reduce the pattern to a combined load which would improve the cost of inlining, unrolling, vectorization etc.
Differential Revision: https://reviews.llvm.org/D127392
Now that the legacy PM is no longer tested, the huge matrix of
test prefixes used by attributor tests is no longer needed and very
confusing for the casual reader. Reduce the prefixes down to just
CHECK, TUNIT and CGSCC.
Operand bundles on assumes do not read or write -- we correctly
modelled the read side of this, but not the write side. In practice
this did not matter because of how the method is used, but this
will become relevant for a future patch.
Prefixing the the SubArch with plus sign makes the ArchFeature name.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D134349
Add TODO for whole array allocatable assignment inside FORALL
Whole allocatable array assignment inside FORALL are otherwise currently
hitting more cryptic asserts.
Add TODO in FORALL assignment when a designator appear with a part ref
that is an allocatable or pointer component (a(i)%pointer%k). The lowering
code does not handle this case well because of the pointer dereference.
Differential Revision: https://reviews.llvm.org/D134440
This patch adds `CLANG_BOLT_INSTRUMENT` option that applies BOLT instrumentation
to Clang, performs a bootstrap build with the resulting Clang, merges resulting
fdata files into a single profile file, and uses it to perform BOLT optimization
on the original Clang binary.
The projects and targets used for bootstrap/profile collection are configurable via
`CLANG_BOLT_INSTRUMENT_PROJECTS` and `CLANG_BOLT_INSTRUMENT_TARGETS`.
The defaults are "llvm" and "count" respectively, which results in a profile with
~5.3B dynamically executed instructions.
The intended use of the functionality is through BOLT CMake cache file, similar
to PGO 2-stage build:
```
cmake <llvm-project>/llvm -C <llvm-project>/clang/cmake/caches/BOLT.cmake
ninja clang++-bolt # pulls clang-bolt
```
Stats with a recent checkout (clang-16), pre-built BOLT and Clang, 72vCPU/224G
| CMake configure with host Clang + BOLT.cmake | 1m6.592s
| Instrumenting Clang with BOLT | 2m50.508s
| CMake configure `llvm` with instrumented Clang | 5m46.364s (~5x slowdown)
| CMake build `not` with instrumented Clang |0m6.456s
| Merging fdata files | 0m9.439s
| Optimizing Clang with BOLT | 0m39.201s
Building Clang:
```cmake ../llvm-project/llvm -DCMAKE_C_COMPILER=... -DCMAKE_CXX_COMPILER=...
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=clang
-DLLVM_TARGETS_TO_BUILD=Native -GNinja```
| | Release | BOLT-optimized
| cmake | 0m24.016s | 0m22.333s
| ninja clang | 5m55.692s | 4m35.122s
I know it's not rigorous, but shows a ballpark figure.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D132975
I found this with the patch: https://reviews.llvm.org/D131858. It breaks
my local implementation but not the in-tree test cases. So I reduce the
failure and submit the test case. The more testing should be always
good.
llvm::dwarfutil::ObjFileAddressMap::relocateIndexedAddr() does not
read address value. The relocateIndexedAddr() should not relocate
the address as the linked binary has already resolved relocations.
But it should read the value. This patch adds the reading value
of the address.
Differential Revision: https://reviews.llvm.org/D133324
If a namelist item is an allocatable or pointer and is also part of a common
block, the box should be loaded from the common block ref.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D134470
This does *not* link with libLLVM, but with static archives instead. Not
super-great, but at least the build works, which is probably better than
failing.
Related to #57551
Differential Revision: https://reviews.llvm.org/D134434
Summary:
This flag was deprecated awhile back but still shows up when using
`clang --help`. This patch just gets rid of it but keeps its interface
for backward compatibility.
- 47afaf2eb0 changed llvm-exegesis cmake rules
- 5b2f838db4 ported them to bazel, but did so by adding all the `lib/{target}/*.cpp` sources in exegesis to the build rule
- c7bf9d084d removed it, because it breaks users who don't build Mips and fail when building `lib/Mips/*.cpp`. But that in turn breaks those who *do* build the Mips target.
This should hopefully fix it for the final time by using selectively build subdirectories of exegesis target libs using llvm_target_exegesis, which is derived from llvm_targets, and is the list that can vary based on the downstream user.
I verified this builds with and without `Mips` in the `DEFAULT_TARGETS` configure list, and also double checked with `bazel query --output=build @llvm-project//llvm:Exegesis` that `lib/Mips/Target.cpp` is being included if and only if `Mips` is in the target list.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134512
Add OpenMP device runtime build support for the gfx1100, gfx1101,
gfx1102, and gfx1103 targets.
Differential Revision: https://reviews.llvm.org/D134465
This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain.
The motivation is to improve the lowering of scatter/gather operations fed by complex geps.
Differential Revision: https://reviews.llvm.org/D134472
This patch adds support for if clause to task construct in OpenMP
IRBuilder.
Reviewed By: raghavendhra
Differential Revision: https://reviews.llvm.org/D130615
Fortran is not clear about the semantics of
```
subroutine subr
integer n = 1
block
data n/2/
end block
end subroutine
```
which could be interpreted as having two variables, each
named 'n', or as having one variable 'n' with invalid double
initialization. Precedents from existing compilers are also
in disagreement.
The most common interpretation, however, agrees with a subtle
reading of the standard: BLOCK constructs scope names that have
local specifications, and a DATA statement is a declaration
construct, not a specification construct. So this example is
*not* acceptable.
Differential Revision: https://reviews.llvm.org/D134391
By draft of C23 (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf),
the description for isless macro under 7.12.17.3 says,
The isless macro determines whether its first argument is less than its second
argument. The value of isless(x,y) is always equal to (x)< (y); however, unlike
(x) < (y), isless(x,y) does not raise the invalid floating-point exception when
x and y are unordered and neither is a signaling NaN.
isless should trap when encountering signaling NaN.
Reviewed By: jcranmer-intel, efriedma
Differential Revision: https://reviews.llvm.org/D134407
Recent update added 'tools/llvm-exegesis/lib/Mips/*.cpp' to srcs for Exegesis cc_library. This was not needed, and in fact breaks things. This CL removes that one change.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134505
Due to CMake mis-configurations, some gtest binaries may be added to the test
list more than once. This patch makes lit avoid such cases and issues a
warning when it happens.
The /Zl flag omits default C runtime library name from obj files.
This patch just adds an equivalent clang driver flag.
Differential Revision: https://reviews.llvm.org/D133959
This reverts commit a212d8da94, and follow
on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.
After re-reading the documentation for hash_combine, I don't think this
is the appropriate hash function to use for computing the hash to use as
a stack id in the metadata, since it is not guaranteed to produce stable
values across executions. I have not hit this problem, but plan to
switch to using an MD5 hash. I am hitting an issue with one of the bots
(https://lab.llvm.org/buildbot/#/builders/171/builds/20732)
where the values produced are only the lower 32 bits of the expected
hash values, however, which I assume is related to the implementation of
hash_combine and hash_code.
I believe I fixed all of the other bot failures with the follow on fixes,
which I'll merge into the new version before reapplying.
I suspect the reason for why D134234 was failing sometimes is because
"operator<" for a ValID could compare ValIDs of different kinds but have
the same non-active values and return an incorrect result. This is an
issue if I attempt to store ValIDs of different kinds in an std::map but
we compare different "active" values. For example, if I create an
std::map and store some ValIDs of kind t_GlobalName, then I insert a
ValID of kind t_GlobalID, the current "operator<" will see that one of
the operands is a t_GlobalID and compare it against the UIntVal of other
items in the map, but the other items in the map don't set UIntVal
because they're not t_GlobalIDs, so I compare against a
dummy/uninitialized value.
It seems pretty easy to add mixed ValID kinds into an std::map in
LLParser, so this just asserts that when doing the comparison that both
ValIDs are the same kind.
Differential Revision: https://reviews.llvm.org/D134488
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated `alloca` instructions as well as meaningless `store`s and
this behavior can leave unused (dead) arguments. To eliminate the dead
arguments and therefore let the DeadCodeElimination remove becoming dead
inserted `GEP`s as well as `load`s and `cast`s in the callers, the
DeadArgumentElimination pass should be run after the ArgumentPromotion
one.
Differential Revision: https://reviews.llvm.org/D128830
RISCV doesn't actually support a scaled form of indexed load and store. We previously handled this by forming the scaled SDNode, and then doing custom legalization during lowering. This patch instead adds a callback via TLI to prevent formation entirely.
This has two effects:
* First, the GEP gets expanded (and used). Instead of the shift being created with an SDLoc of the memory operation, it has the SDLoc of the GEP instruction. This avoids the scheduler perturbing IR order when there's no reason to.
* Second, we fix what appears to be a bug in index calculation with RV32. The rules for GEPs require index calculation be done in particular bitwidth, and it appears the custom legalization code got this wrong for the case where index type exceeds pointer width. (Or at least, I trust the generic GEP lowering to be correct a lot more.)
The DAGCombiner change to handle VPScatter/VPGather is technically separate, but is required to prevent a regression on those intrinsics.
Differential Revision: https://reviews.llvm.org/D134382
Translate HLSLNumThreadsAttr into function attribute with name "dx.numthreads" and value format as "x,y,z".
Reviewed By: beanz
Differential Revision: https://reviews.llvm.org/D131799
This allows downstream uses to use the implementation of the tiling
itself, while performing other transformations that are necessary to
go with it.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D134335
Spurious ref edges are ref edges that still exist in the call graph even
though the corresponding IR reference no longer exists. This can cause
issues when deleting a dead function which has a spurious ref edge
pointed at it because currently we expect the dead function's RefSCC to
be trivial.
In the case that the dead function's RefSCC is not trivial, remove all
ref edges from other nodes in the RefSCC to it.
Removing a ref edge can result in splitting RefSCCs. There's actually no
reason to revisit those RefSCCs because currently we only run passes on
SCCs, and we've already added all SCCs in the RefSCC to the worklist.
(as opposed to removing the ref edge in
updateCGAndAnalysisManagerForPass() which can modify the call graph of
SCCs we have not visited yet). We also don't expect that RefSCC
refinement will allow us to glean any more information for optimization
use. Also, doing so would drastically increase the complexity of
LazyCallGraph::removeDeadFunction(), requiring us to return a list of
invalidated RefSCCs and new RefSCCs to add to the worklist.
Fixes#56503
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D133907
This patch adds auto source map deduce count as a target level statistics.
This will help telemetry to track how many debug sessions benefit from this feature.
Differential Revision: https://reviews.llvm.org/D134483