Prior to this change, for a few compiler-rt libraries such as ubsan and
the profile library, Clang would embed "-defaultlib:path/to/rt-arch.lib"
into the .drective section of every object compiled with
-finstr-profile-generate or -fsanitize=ubsan as appropriate.
These paths assume that the link step will run from the same working
directory as the compile step. There is also evidence that sometimes the
paths become absolute, such as when clang is run from a different drive
letter from the current working directory. This is fragile, and I'd like
to get away from having paths embedded in the object if possible. Long
ago it was suggested that we use this for ASan, and apparently I felt
the same way back then:
https://reviews.llvm.org/D4428#56536
This is also consistent with how all other autolinking usage works for
PS4, Mac, and Windows: they all use basenames, not paths.
To keep things working for people using the standard GCC driver
workflow, the driver now adds the resource directory to the linker
library search path when it calls the linker. This is enough to make
check-ubsan pass, and seems like a generally good thing.
Users that invoke the linker directly (most clang-cl users) will have to
add clang's resource library directory to their linker search path in
their build system. I'm not sure where I can document this. Ideally I'd
also do it in the MSBuild files, but I can't figure out where they go.
I'd like to start with this for now.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D65543
Add support for reserving LR in:
* the driver through `-ffixed-x30`
* cc1 through `-target-feature +reserve-x30`
* the backend through `-mattr=+reserve-x30`
* a subtarget feature `reserve-x30`
the same way we're doing for the other registers.
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
This patch includes:
- Command line options to enable these features with +i8mm, +f32mm, or f64mm
Note: +f32mm and +f64mm are optional and so are not enabled by default
This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)
Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman
Reviewers: t.p.northover, DavidSpickett
Reviewed By: DavidSpickett
Subscribers: DavidSpickett, ostannard, kristof.beyls, danielkiss,
cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77875
Time profiler emits relative timestamps for events (the number of
microseconds passed since the start of the current process).
This patch allows combining events from different processes while
preserving their relative timing by emitting a new attribute
"beginningOfTime". This attribute contains the system time that
corresponds to the zero timestamp of the time profiler.
This has at least two use cases:
- Build systems can use this to merge time traces from multiple compiler
invocations and generate statistics for the whole build. Tools like
ClangBuildAnalyzer could also leverage this feature.
- Compilers that use LLVM as their backend by invoking llc/opt in
a child process. If such a compiler supports generating time traces
of its own events, it could merge those events with LLVM-specific
events received from llc/opt, and produce a more complete time trace.
A proof-of-concept script that merges multiple logs that
contain a synchronization point into one log:
https://github.com/broadwaylamb/merge_trace_events
Differential Revision: https://reviews.llvm.org/D78030
Summary:
Change the default ABI to be compatible with GCC. For 32-bit ELF
targets other than Linux, Clang now returns small structs in registers
r3/r4. This affects FreeBSD, NetBSD, OpenBSD. There is no change for
32-bit Linux, where Clang continues to return all structs in memory.
Add clang options -maix-struct-return (to return structs in memory) and
-msvr4-struct-return (to return structs in registers) to be compatible
with gcc. These options are only for PPC32; reject them on PPC64 and
other targets. The options are like -fpcc-struct-return and
-freg-struct-return for X86_32, and use similar code.
To actually return a struct in registers, coerce it to an integer of the
same size. LLVM may optimize the code to remove unnecessary accesses to
memory, and will return i32 in r3 or i64 in r3:r4.
Fixes PR#40736
Patch by George Koehler!
Reviewed By: jhibbits, nemanjai
Differential Revision: https://reviews.llvm.org/D73290
Summary:
This flag has been deprecated, with an on-by-default warning encouraging
users to explicitly specify whether they mean "all" or ubsan for 5 years
(released in Clang 3.7). Change it to mean what we wanted and
undeprecate it.
Also make the argument to -fsanitize-trap optional, and likewise default
it to 'all', and express the aliases for these flags in the .td file
rather than in code. (Plus documentation updates for the above.)
Reviewers: kcc
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77753
Since the default logic was based on having fast denormal/fma
features, and the default target has no features, we assumed flushing
by default. This fixes incorrectly assuming flushing in builds for
"generic" IR libraries.
The handling for no specified --cuda-gpu-arch in HIP is kind of
broken. Somewhere else forces a default target of gfx803, which does
not enable denormal handling by default. We don't see this default
switching here, so you'll end up with a different denormal mode
depending on whether you explicitly requested gfx803, or used it by
default.
I didn't realize HIP was a distinct offloading kind, so the subtarget
was looking for -march, which isn't correct for HIP. We also have the
possibility of different denormal defaults in the case of multiple
offload targets, so we need to thread the JobAction through the target
hook.
Currently the library is separately linked, but this isn't correct to
implement fast math flags correctly. Each module should get the
version of the library appropriate for its combination of fast math
and related flags, with the attributes propagated into its functions
and internalized.
HIP already maintains the list of libraries, but this is not used for
OpenCL. Unfortunately, HIP uses a separate --hip-device-lib argument,
despite both languages using the same bitcode library. Eventually
these two searches need to be merged.
An additional problem is there are 3 different locations the libraries
are installed, depending on which build is used. This also needs to be
consolidated (or at least the search logic needs to deal with this
unnecessary complexity).
This adds support for enabling experimental/unratified RISC-V ISA
extensions in the -march string in the case where an explicit version
number has been declared, and the -menable-experimental-extensions flag
has been provided.
This follows the design as discussed on the mailing lists in the
following RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138364.html
Since the RISC-V toolchain definition currently rejects any extension
with an explicit version number, the parsing logic has been tweaked to
support this, and to allow standard extensions to have their versions
checked in future patches.
The bitmanip 'b' extension has been added as a first use of this support,
it should easily extend to other as yet unratified extensions (such as
the vector 'v' extension).
Differential Revision: https://reviews.llvm.org/D73891
These values are not always known since there is a configuration
option to set the default linker, CLANG_DEFAULT_LINKER.
Differential Revision: https://reviews.llvm.org/D77684
Summary:
The option `-mpad-max-prefix-size` performs some checking and delegate to MC option `-x86-pad-max-prefix-size`. This option is designed for eliminate NOPs when we need to align something by adding redundant prefixes to instructions, e.g. it can be used along with `-malign-branch`, `-malign-branch-boundary` to prefix padding branch.
It has similar (but slightly different) effect as GAS's option `-malign-branch-prefix-size`, e.g. `-mpad-max-prefix-size` can also elminate NOPs emitted by align directive, so we use a different name here. I remove the option `-malign-branch-prefix-size` since is unimplemented and not needed. If we need to be compatible with GAS, we can make `-malign-branch-prefix-size` an alias for this option later.
Reviewers: jyknight, reames, MaskRay, craig.topper, LuoYuanke
Reviewed By: MaskRay, LuoYuanke
Subscribers: annita.zhang, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77628
Update the sysroot expectation to match other targets and breakout
linux/musl toolchain tests into a new file.
Differential Revision: https://reviews.llvm.org/D77440
Summary:
- Use `device_builtin_surface` and `device_builtin_texture` for
surface/texture reference support. So far, both the host and device
use the same reference type, which could be revised later when
interface/implementation is stablized.
Reviewers: yaxunl
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77583
Instead of checking if each symlink exists before removing it,
remove the whole temp dir housing the symlinks before recreating it.
This is a bit shorter, conceptually simpler (in that the first
and consecutive test runs have more similar behavior), it's what we're
already doing in almost all places where we do it, and it works if the
symlink exists but is a dead link (e.g. when it points into the build
dir but the build dir is renamed).
No intended behavior change.
clang with -flto does not handle -foptimization-record-path=<path>
This dulicates the code from ToolChains/Clang.cpp with modifications to
support everything in the same fashion.
Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph.
More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK).
Also adds a new target feature to X86: +lvi-load-hardening
The feature can be added via the clang CLI using -mlvi-hardening.
Differential Revision: https://reviews.llvm.org/D75936
This pass replaces each indirect call/jump with a direct call to a thunk that looks like:
lfence
jmpq *%r11
This ensures that if the value in register %r11 was loaded from memory, then
the value in %r11 is (architecturally) correct prior to the jump.
Also adds a new target feature to X86: +lvi-cfi
("cfi" meaning control-flow integrity)
The feature can be added via clang CLI using -mlvi-cfi.
This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code.
Differential Revision: https://reviews.llvm.org/D76812
This wasn't respecting the flush mode based on the default, and also
wasn't correctly handling the explicit
-fno-cuda-flush-denormals-to-zero overriding the mode.
The driver enables -fdiagnostics-show-option by default, so flip the CC1
default to reduce the lengths of common CC1 command lines.
This change also makes ParseDiagnosticArgs() consistently enable
-fdiagnostics-show-option by default.
Apparently HIPToolChain does not subclass from AMDGPUToolChain, so
this was not applying the new denormal attributes. I'm not sure why
this doesn't subclass. Just copy the implementation for now.
Since GlobalISel is maturing and is already on at -O0 for AArch64, it's not
completely "experimental". Create a more appropriate driver flag and make
the older option an alias for it.
Differential Revision: https://reviews.llvm.org/D77103
Currently -fno-unroll-loops is ignored when doing LTO on Darwin. This
patch adds a new -lto-no-unroll-loops option to the LTO code generator
and forwards it to the linker if -fno-unroll-loops is passed.
Reviewers: thegameg, steven_wu
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D76916
There was a misisng space between the -march and --cuda-gpu-arch
arguments, so --cuda-gpu-arch wasn't actually being parsed. I'm not
sure what the intent of the sm_10 run lines were, but they error as an
unsupported architecture. Switch these to something else.
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062