Add a new `raw_pwrite_ostream` variant, `buffer_unique_ostream`, which
is like `buffer_ostream` but with unique ownership of the stream it's
wrapping. Use this in CompilerInstance to simplify the ownership of
non-seeking output streams, avoiding logic sprawled around to deal with
them specially.
This also simplifies future work to encapsulate output files in a
different class.
Differential Revision: https://reviews.llvm.org/D93260
Just use the existing `Known.sextInReg` implementation.
- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.
Differential Revision: https://reviews.llvm.org/D95484
This patch improves the availability for variables stored in the
coroutine frame by emitting an alloca to hold the pointer to the frame
object and rewriting dbg.declare intrinsics to point inside the frame
object using salvaged DIExpressions. Finally, a new alloca is created
in the funclet to hold the FramePtr pointer to ensure that it is
available throughout the entire function at -O0.
This path also effectively reverts D90772. The testcase updates
highlight nicely how every removed CHECK for a dbg.value is preceded
by a new CHECK for a dbg.declare.
Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews!
Differential Revision: https://reviews.llvm.org/D93497
rdar://71866936
Found this memory leak in `CompilerInstance::setVerboseOutputStream` by
inspection; it looks like this wasn't previously exercised, since it was
never called twice.
Differential Revision: https://reviews.llvm.org/D93249
STRT, STRHT, and STRBT are store instructions and their source register
$Rt should be treated as an input operand instead of an output operand.
This should fix things (e.g., liveness tracking in LivePhysRegs) if
these instructions were used in CodeGen.
Differential Revision: https://reviews.llvm.org/D95074
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D95457
Revert the change to use APInt::isSignedIntN from
5ff5cf8e05.
Its clear that the games we were playing to avoid the topological
sort aren't working. So just fix it once and for all.
Fixes PR48888.
Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require
3.15 or above.
Differential Revision: https://reviews.llvm.org/D95474
[libomptarget][cuda] Gracefully handle missing cuda library
If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95412
[libomptarget][cuda] Only run tests when sure there is cuda available
Prior to D95155, building the cuda plugin implied cuda was installed locally.
With that change, every machine can build a cuda plugin, but they won't all have
cuda and/or an nvptx card installed locally.
This change enables the nvptx tests when either:
- libcuda is present
- the user has forced use of the dlopen stub
The default case when there is no cuda detected will no longer attempt to
run the tests on nvptx hardware, as was the case before D95155.
Reviewed By: jdoerfert, ronlieb
Differential Revision: https://reviews.llvm.org/D95467
This introduces a remote offloading plugin for libomptarget. This
implementation relies on gRPC and protobuf, so this library will only
build if both libraries are available on the system. The corresponding
server is compiled to `openmp-offloading-server`.
This is a large change, but the only way to split this up is into RTL/server
but I fear that could introduce an inconsistency amongst them.
Ideally, tests for this should be added to the current ones that but that is
problematic for at least one reason. Given that libomptarget registers plugin
on a first-come-first-serve basis, if we wanted to offload onto a local x86
through a different process, then we'd have to either re-order the plugin list
in `rtl.cpp` (which is what I did locally for testing) or find a better
solution for runtime plugin registration in libomptarget.
Differential Revision: https://reviews.llvm.org/D95314
Before this change, when reading ELF file, elfabi determines number of
entries in .dynsym by reading the .gnu.hash section. This change makes
elfabi read section headers directly first. This change allows elfabi
works on ELF files which do not have .gnu.hash sections.
Differential Revision: https://reviews.llvm.org/D93362
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
Second try, handling both a bogus arch string and the "null file & arch" used
to create an empty but valid target.
Also check in that case before logging (previously the logging would have
crashed.)
Split the tests from acc-clause-validity.f90 in dedicated files by directives.
The file acc-clause-validity.f90 was getting too big to be correctly maintained.
Tests are identical.
Reviewed By: SouraVX
Differential Revision: https://reviews.llvm.org/D95328
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.
In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D94735
`getLineNumber()` picks CurLoc if the parameter is invalid. This appears to
mainly work around missing SourceLocation information for some constructs, but
sometimes adds unintended locations.
* For `CodeGenObjC/debug-info-blocks.m`, `CurLoc` has been advanced to the closing brace. The debug line of `ImplicitVarParameter` is set to the line of `}` because this implicit parameter has an invalid `SourceLocation`. The debug line is a bit arbitrary - perhaps the location of `^{` is better.
* The file/line of Clang synthesized `__va_list_tag` is arbitrarily attached a `#include` line. D94735
Drop the special case to make getLineNumber less magic and add CurLoc fallback in its callers instead.
Tested with stage 2 -DCMAKE_BUILD_TYPE=Debug clang, byte identical.
Reviewed By: #debug-info, aprantl
Differential Revision: https://reviews.llvm.org/D94391
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
In order to support remote execution, we need to be able to send the
target binary description to the remote host for registration (and
consequent deregistration). To support this, I added these two
optional new functions to the plugin API:
- `__tgt_rtl_register_lib`
- `__tgt_rtl_unregister_lib`
These functions will be called to properly manage the instance of
libomptarget running on the remote host.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D93293
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
FindTarget on the virtual keyword or access specifier of a base specifier will now resolve to type of the base specifier.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D95338
Selection now includes the virtual and access modifier as part of their range for cxx base specifiers.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D95231
Recent shouldAssumeDSOLocal changes (introduced by 961f31d8ad)
do not take in consideration the relocation model anymore. The ARM
fast-isel pass uses the function return to set whether a global symbol
is loaded indirectly or not, and without the expected information
llvm now generates an extra load for following code:
```
$ cat test.ll
@__asan_option_detect_stack_use_after_return = external global i32
define dso_local i32 @main(i32 %argc, i8** %argv) #0 {
entry:
%0 = load i32, i32* @__asan_option_detect_stack_use_after_return,
align 4
%1 = icmp ne i32 %0, 0
br i1 %1, label %2, label %3
2:
ret i32 0
3:
ret i32 1
}
attributes #0 = { noinline optnone }
$ lcc test.ll -o -
[...]
main:
.fnstart
[...]
movw r0, :lower16:__asan_option_detect_stack_use_after_return
movt r0, :upper16:__asan_option_detect_stack_use_after_return
ldr r0, [r0]
ldr r0, [r0]
cmp r0, #0
[...]
```
And without 'optnone' it produces:
```
[...]
main:
.fnstart
[...]
movw r0, :lower16:__asan_option_detect_stack_use_after_return
movt r0, :upper16:__asan_option_detect_stack_use_after_return
ldr r0, [r0]
clz r0, r0
lsr r0, r0, #5
bx lr
[...]
```
This triggered a lot of invalid memory access in sanitizers for
arm-linux-gnueabihf. I checked this patch both a stage1 built with
gcc and a stage2 bootstrap and it fixes all the Linux sanitizers
issues.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D95379
The initial problem with the remaining bot config was resolved.
We can now use Python3. Let's use `os.cpu_count()` to cleanup this
helper.
Differential Revision: https://reviews.llvm.org/D94734
37510f69b4 tried to fix GCC 5.x compilation
by making the enum which is used as a unordered_map key unscoped. However it
seems that in GCC 5.x, enum keys are not supported *at all* in unordered_maps
(at least that's what some trial&error on godbolt tells me). This updates the
workaround to just use an int until GCC 5.x support is dropped.
The current context is thread-local state, and in preparation of GPU async execution (on multiple threads) we need to set the context before calling API that create resources.
Reviewed By: herhut
Differential Revision: https://reviews.llvm.org/D94495
If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.
Try to predict that we will have an FP later, although this is ugly.
From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.
With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D94745
This is a common symbol that's missing from our mapping because
cppreference yields multiple headers.
Add it manually by picking cstddef to prevent insertion of some stdlib-internal
headers instead.
Fixes https://github.com/clangd/clangd/issues/666.
Differential Revision: https://reviews.llvm.org/D95423
This follows up on the introduction of C API for the same object and is similar
to AffineExpr and AffineMap.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D95437
The existing test has less FMF than we might expect if
our FMF was fixed (on all FP values), so this additional
test is intended to check propagation in a more "normal"
example.
The switch must set the predicate correctly; anything else
should lead to unreachable/assert.
I'm trying to fix FMF propagation here and the callers,
so this is a preliminary cleanup.