This test is failing on some builders (see [1]) with the following error:
error: Added modules have incompatible data layouts:
e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs
E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
The JIT layout is correct, but some IR module added to the JIT is using a
little-endian layout instead.
This commit disables the test on ppc64 until we can investigate further and
fix the bug.
[1] https://lab.llvm.org/staging/#/builders/126/builds/371
While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.
They require clang-11 or above for building and hence had to be disabled
as the bots did not have clang-11 or higher. Bots have now been upgraded
so we can enable these functions now.
llvm-dev message: https://lists.llvm.org/pipermail/llvm-dev/2021-May/150465.html
In an ELF shared object, a default visibility defined symbol is preemptible by
default. This creates some missed optimization opportunities.
-Bsymbolic-functions is more aggressive than our current -fvisibility-inlines-hidden
(present since 2012) as it applies to all function definitions. It can
* avoid PLT for cross-TU function calls && reduce dynamic symbol lookup
* reduce dynamic symbol lookup for taking function addresses and optimize out GOT/TOC on x86-64/ppc64
In a -DLLVM_TARGETS_TO_BUILD=X86 build, the number of JUMP_SLOT decreases from 12716 to 1628, and the number of GLOB_DAT decreases from 1918 to 1313
The built clang with `-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on` is significantly faster.
See the Linux kernel build result https://bugs.archlinux.org/task/70697
Note: the performance of -fno-semantic-interposition -Bsymbolic-functions
libLLVM.so and libclang-cpp.so is close to a PIE binary linking against
`libLLVM*.a` and `libclang*.a`. When the host compiler is Clang,
-Bsymbolic-functions is the major contributor. On x86-64 (with GOTPCRELX) and
ppc64 ELFv2, the GOT/TOC relocations can be optimized.
Some implication:
Interposing a subset of functions is no longer supported.
(This is fragile on ELF and unsupported on Mach-O at all. For Mach-O we don't
use `ld -interpose` or `-flat_namespace`)
Compiling a program which takes the address of any LLVM function with
`{gcc,clang} -fno-pic` and expects the address to equal to the address taken
from libLLVM.so or libclang-cpp.so is unsupported. I am fairly confident that
llvm-project shouldn't have different behaviors depending on such pointer
equality (as we've been using -fvisibility-inlines-hidden which applies to
inline functions for a long time), but if we accidentally do, users should be
aware that they should not make assumption on pointer equality in `-fno-pic`
mode.
See more on https://maskray.me/blog/2021-05-09-fno-semantic-interposition
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D102090
Summary:
This patch prevents the Attributor instances made in the CGSCC pass from
deleting functions. This prevents the attributor from changing the call
graph while OpenMPOpt is working with it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102363
Lowering div elementwise op to the linalg dialect. Since tosa only supports integer division, that is the only version that is currently implemented.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D102430
This covers the extremely common case of replacing all uses of a Value
with a new op that is itself a user of the original Value.
This should also be a little bit more efficient than the
`SmallPtrSet<Operation *, 1>{op}` idiom that was being used before.
Differential Revision: https://reviews.llvm.org/D102373
The implementations use the x86_64 FPU instructions. These instructions
are extremely slow compared to a polynomial based software
implementation. Also, their accuracy falls drastically once the input
goes beyond 2PI. To improve both the speed and accuracy, we will be
taking the following approach going forward:
1. As a follow up to this CL, we will implement a range reduction algorithm
which will expand the accuracy to the entire double precision range.
2. After that, we will replace the HW instructions with a polynomial
implementation to improve the run time.
After step 2, the implementations will be accurate, performant and target
architecture independent.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D102384
This patch implements the following semantic check:
```
A master region may not be closely nested inside a work-sharing, loop, atomic, task, or taskloop region.
```
Adds a test case and also modifies a couple of existing test cases to include the check.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D100228
If the loop condition is a value of an instance variable, a property value,
or a message result value, it's a good indication that the loop is not infinite
and we have a really hard time proving the opposite so suppress the warning.
Differential Revision: https://reviews.llvm.org/D102294
Take advantage of the new ASTMatcher added in D102213 to fix massive false negatives of the infinite loop checker on Objective-C.
Differential Revision: https://reviews.llvm.org/D102214
The new matcher additionally covers blocks and Objective-C methods.
This matcher actually makes sure that the statement truly belongs
to that declaration's body. forFunction() incorrectly reported that
a statement in a nested block belonged to the surrounding function.
forFunction() is now deprecated due to the above footgun, in favor of
forCallable(functionDecl()) when only functions need to be considered.
Differential Revision: https://reviews.llvm.org/D102213
Add overloads to AsGenericExpr() in Evaluate/tools.h to take care
of wrapping an untyped DataRef or bare Symbol in a typed Designator
wrapped up in a generic Expr<SomeType>. Use the new overloads to
replace a few instances of code that was calling TypedWrapper<>()
with a dynamic type.
This new tool will be useful in lowering to drive some code that
works with typed expressions (viz., list-directed I/O list items)
when starting with only a bare Symbol (viz., NAMELIST).
Differential Revision: https://reviews.llvm.org/D102352
I've taken the following steps to add unwinding support from inline assembly:
1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:
```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
to label %exit unwind label %uexit
```
2.) Add Bitcode writing/reading support + LLVM-IR parsing.
3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.
4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.
5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.
6.) Don't allow unwinding callbr.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D95745
Added hashst to the prologue and hashchk to the epilogue.
The hash for the prologue and epilogue must always be stored as the first
element in the local variable space on the stack.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D99377
API, implementation, and basic tests for the transformational
reduction intrinsic function DOT_PRODUCT in the runtime support
library.
Differential Revision: https://reviews.llvm.org/D102351
DisableGeneratingGlobalModuleIndex was being set by
CompilerInstance::findOrCompileModuleAndReadAST most of (but not all of)
the times it returned `nullptr` as a "normal" failure. Pull that up to
the caller, CompilerInstance::loadModule, to simplify the code. This
resolves a number of FIXMEs added during the refactoring in
5cca622310.
The extra cases where this is set are all some version of a fatal error,
and the only client of the field, shouldBuildGlobalModuleIndex, seems
to be unreachable in that case. Even if there is some corner case where
this has an effect, it seems like the right/consistent behaviour.
Differential Revision: https://reviews.llvm.org/D101672
The original change was reverted because it was discovered
that clang mishandles thunks, and they receive wrong
attributes for their this/return types - the ones for the function
they will call, not the ones they have.
While i have tried to fix this in https://reviews.llvm.org/D100388
that patch has been up and stuck for a month now,
with little signs of progress.
So while it will be good to solve this for real,
for now we can simply avoid introducing the bug,
by not annotating this/return for thunks.
This reverts commit 6270b3a1ea,
relanding 0aa0458f14.
As it was discovered in post-commit feedback
for 0aa0458f14,
we handle thunks incorrectly, and end up annotating
their this/return with attributes that are valid
for their callees, not for thunks themselves.
While it would be good to fix this properly,
and keep annotating them on thunks,
i've tried doing that in https://reviews.llvm.org/D100388
with little success, and the patch is stuck for a month now.
So for now, as a stopgap measure, subj.
We currently prefer t2CMPrs over t2CMPri when the node contains a shift.
This can introduce more nodes if the shift has multiple uses though, as
value from the shift will be needed anyway, and in the case of a t2CMPri
compared with zero will more readily be removed entirely.
Differential Revision: https://reviews.llvm.org/D101688
Rename CompilerInstance's ModuleBuildFailed field to
DisableGeneratingGlobalModuleIndex, which more precisely describes its
role. Otherwise, it's hard to suss out how it's different from
ModuleLoader::HadFatalFailure, and what sort of code simplifications are
safe.
Differential Revision: https://reviews.llvm.org/D101670
Support OpImageQuerySize in spirv dialect
co-authored-by: Alan Liu <alanliu.yf@gmail.com>
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D102029
Fix a probable typo in two PSTL tests that causes warnings with GCC.
Patch by Jonathan Wakely (jwakely).
Reviewed By: zoecarver
Differential Revision: https://reviews.llvm.org/D102327
5cca622310 refactored
CompilerInstance::loadModule, splitting out
findOrCompileModuleAndReadAST, but was careful to avoid making any
functional changes. It added ModuleLoader::OtherUncachedFailure to
facilitate this and left behind FIXMEs asking why certain failures
weren't cached.
After a closer look, I think we can just remove this and simplify the
code. This changes the behaviour of the following (simplified) code from
CompilerInstance::loadModule, causing a failure to be cached more often:
```
if (auto MaybeModule = MM.getCachedModuleLoad(*Path[0].first))
return *MaybeModule;
if (ModuleName == getLangOpts().CurrentModule)
return MM.cacheModuleLoad(PP.lookupModule(...));
ModuleLoadResult Result = findOrCompileModuleAndReadAST(...);
if (Result.isNormal()) // This will be 'true' more often.
return MM.cacheModuleLoad(..., Module);
return Result;
```
`MM` here is a ModuleMap owned by the Preprocessor. Here are the cases
where `findOrCompileModuleAndReadAST` starts returning a "normal" failed
result:
- Emitted `diag::err_module_not_found`, where there's no module map
found.
- Emitted `diag::err_module_build_disabled`, where implicitly building
modules is disabled.
- Emitted `diag::err_module_cycle`, which detects module cycles in the
implicit modules build system.
- Emitted `diag::err_module_not_built`, which avoids building a module
in this CompilerInstance if another one tried and failed already.
- `compileModuleAndReadAST()` was called and failed to build.
The four errors are all fatal, and last item also reports a fatal error,
so it this extra caching has no functionality change... but even if it
did, it seems fine to cache these failed results within a ModuleMap
instance (note that each CompilerInstance has its own Preprocessor and
ModuleMap).
Differential Revision: https://reviews.llvm.org/D101667
Before this commit, we'd get a compilation error because the operator() overload was ambiguous.
Differential Revision: https://reviews.llvm.org/D102263
Add user-facing front end option to turn off power10 prefixed instructions.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D102191
[libomptarget][amdgpu] Fix truncation error for partial wavefront
The partial barrier implementation involves one wavefront resetting and N-1
waiting. This change future proofs against launching with a number of threads
that is not a multiple of the wavefront size.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102407
[libomptarget][amdgpu] Convert an assert to print and offload_fail
The kernel launched is supposed to be present in the binary, but a not yet
diagnosed bug means it is missing for some of the qmcpack test cases. Changing
from assert to print and offload_fail should help diagnose that and similar bugs.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102378