Add address sanitizer instrumentation support for accesses to global
and constant address spaces in AMDGPU. It strictly avoids instrumenting
the stack and assumes x86 as the host.
Reviewed by: vitalybuka
Differential Revision: https://reviews.llvm.org/D99071
The problem is the following. With fast8, we broke an important
invariant when loading shadows. A wide shadow of 64 bits used to
correspond to 4 application bytes with fast16; so, generating a single
load was okay since those 4 application bytes would share a single
origin. Now, using fast8, a wide shadow of 64 bits corresponds to 8
application bytes that should be backed by 2 origins (but we kept
generating just one).
Let’s say our wide shadow is 64-bit and consists of the following:
0xABCDEFGH. To check if we need the second origin value, we could do
the following (on the 64-bit wide shadow) case:
- bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
- push the result along with the first origin load to the shadow/origin vectors
- load the second 32-bit origin of the 64-bit wide shadow
- push the wide shadow along with the second origin to the shadow/origin vectors.
The combineOrigins would then select the second origin if the wide
shadow is of the form 0xABCDE0000. The tests illustrate how this
change affects the generated bitcode.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D101584
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
The first version of origin tracking tracks only memory stores. Although
this is sufficient for understanding correct flows, it is hard to figure
out where an undefined value is read from. To find reading undefined values,
we still have to do a reverse binary search from the last store in the chain
with printing and logging at possible code paths. This is
quite inefficient.
Tracking memory load instructions can help this case. The main issues of
tracking loads are performance and code size overheads.
With tracking only stores, the code size overhead is 38%,
memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
larger than #store, so both code size and cpu overhead increases. The
first blocker is code size overhead: link fails if we inline tracking
loads. The workaround is using external function calls to propagate
metadata. This is also the workaround ASan uses. The cpu overhead
is ~10x. This is a trade off between debuggability and performance,
and will be used only when debugging cases that tracking only stores
is not enough.
Reviewed By: gbalats
Differential Revision: https://reviews.llvm.org/D100967
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
Instruction::getDebugLoc can return an invalid DebugLoc. For such cases
where metadata was accidentally removed from the libcall insertion
point, simply insert a DILocation with line 0 scoped to the caller. When
we can inline the libcall, such as during LTO, then we won't fail a
Verifier check that all calls to functions with debug metadata
themselves must have debug metadata.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100158
Using $ breaks demangling of the symbols. For example,
$ c++filt _Z3foov\$123
_Z3foov$123
This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.
Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:
$ c++filt _Z3foov.123
foo() [clone .123]
This is already done in some places; try to do it everywhere.
Differential revision: https://reviews.llvm.org/D97484
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
Userspace page aliasing allows us to use middle pointer bits for tags
without untagging them before syscalls or accesses. This should enable
easier experimentation with HWASan on x86_64 platforms.
Currently stack, global, and secondary heap tagging are unsupported.
Only primary heap allocations get tagged.
Note that aliasing mode will not work properly in the presence of
fork(), since heap memory will be shared between the parent and child
processes. This mode is non-ideal; we expect Intel LAM to enable full
HWASan support on x86_64 in the future.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98875
Userspace page aliasing allows us to use middle pointer bits for tags
without untagging them before syscalls or accesses. This should enable
easier experimentation with HWASan on x86_64 platforms.
Currently stack, global, and secondary heap tagging are unsupported.
Only primary heap allocations get tagged.
Note that aliasing mode will not work properly in the presence of
fork(), since heap memory will be shared between the parent and child
processes. This mode is non-ideal; we expect Intel LAM to enable full
HWASan support on x86_64 in the future.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98875
Subsequent patches will implement page-aliasing mode for x86_64, which
will initially only work for the primary heap allocator. We force
callback instrumentation to simplify the initial aliasing
implementation.
Reviewed By: vitalybuka, eugenis
Differential Revision: https://reviews.llvm.org/D98069
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).
With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).
```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```
Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.
Differential Revision: https://reviews.llvm.org/D98903
This is only adding support to the dfsan instrumentation pass but not
to the runtime.
Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98734
This broke the check-profile tests on Mac, see comment on the code
review.
> This is no longer needed, we can add __llvm_profile_runtime directly
> to llvm.compiler.used or llvm.used to achieve the same effect.
>
> Differential Revision: https://reviews.llvm.org/D98325
This reverts commit c7712087cb.
Also reverting the dependent follow-up commit:
Revert "[InstrProfiling] Generate runtime hook for ELF platforms"
> When using -fprofile-list to selectively apply instrumentation only
> to certain files or functions, we may end up with a binary that doesn't
> have any counters in the case where no files were selected. However,
> because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
> runtime would still be pulled in and incur some non-trivial overhead,
> especially in the case when the continuous or runtime counter relocation
> mode is being used. A better way would be to pull in the profile runtime
> only when needed by declaring the __llvm_profile_runtime symbol in the
> translation unit only when needed.
>
> This approach was already used prior to 9a041a7522, but we changed it
> to always generate the __llvm_profile_runtime due to a TAPI limitation.
> Since TAPI is only used on Mach-O platforms, we could use the early
> emission of __llvm_profile_runtime there, and on other platforms we
> could change back to the earlier approach where the symbol is generated
> later only when needed. We can stop passing -u__llvm_profile_runtime to
> the linker on Linux and Fuchsia since the generated undefined symbol in
> each translation unit that needed it serves the same purpose.
>
> Differential Revision: https://reviews.llvm.org/D98061
This reverts commit 87fd09b25f.
When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.
This approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.
Differential Revision: https://reviews.llvm.org/D98061
This is no longer needed, we can add __llvm_profile_runtime directly
to llvm.compiler.used or llvm.used to achieve the same effect.
Differential Revision: https://reviews.llvm.org/D98325
Remove hard-coded shadow width references. Separate CHECK lines that only apply to fast16 mode.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98308
This removes hard-coded shadow width references and adds more RUN
lines to increase test coverage under different options (fast16 labels
mode).
Also, shortens the test by unifying common lines under both combine- and no-combine-ptr-label options.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98227
As a preparation step for fast8 support, we need to update the tests
to pass in both modes. That requires generalizing the shadow width
and remove any hard coded references that assume it's always 2 bytes.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98090
As a preparation step for fast8 support, we need to update the tests
to pass in both modes. That requires generalizing the shadow width
and remove any hard coded references that assume it's always 2 bytes.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D97988
Add more expectations in vector.ll and select.ll based on command-line option combinations.
Also, remove hard-coded shadow width references to enable fast8 transition.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D97903
As a preparation step for fast8 support, we need to update the tests
to pass in both modes. That requires generalizing the shadow width
and remove any hard coded references that assume it's always 2 bytes.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D97884
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.
With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.
Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.
This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.
`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.
Behaviors on COFF/Mach-O are not affected.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D97649
As a preparation step for fast8 support, we need to update the tests
to pass in both modes. That requires generalizing the shadow width
and remove any hard coded references that assume it's always 2 bytes.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D97723
This is a part of https://reviews.llvm.org/D95835.
One issue is about origin load optimization: see the
comments of useCallbackLoadLabelAndOrigin
@gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad.
Reviewed By: morehouse, gbalats
Differential Revision: https://reviews.llvm.org/D97570
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.
With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker no longer lets `__start_/__stop_` references retain them.
Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.
This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.
`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.
Behaviors on other COFF/Mach-O are not affected.
Differential Revision: https://reviews.llvm.org/D97649
Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.
The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF)
are similar to D97432: `__profd_` is not directly referenced, so
`__profd_` may be discarded while `__profc_` is retained, breaking the
interconnection. We currently conservatively add all such sections to
`llvm.used` and let the linker do GC for ELF.
In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC).
Use `llvm.compiler.used` to retain the current GC behavior.
Differential Revision: https://reviews.llvm.org/D97585
This will allow identifying exactly how many shadow bytes were used
during compilation, for when fast8 mode is introduced.
Also, it will provide a consistent matching point for instrumentation
tests so that the exact llvm type used (i8 or i16) for the shadow can
be replaced by a pattern substitution. This is handy for tests with
multiple prefixes.
Reviewed by: stephan.yichao.zhao, morehouse
Differential Revision: https://reviews.llvm.org/D97409
This is a part of https://reviews.llvm.org/D95835.
Each customized function has two wrappers. The
first one dfsw is for the normal shadow propagation. The second one dfso is used
when origin tracking is on. It calls the first one, and does additional
origin propagation. Which one to use can be decided at instrumentation
time. This is to ensure minimal additional overhead when origin tracking
is off.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97483
`__sancov_pcs` parallels the other metadata section(s). While some optimizers
(e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the
sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to
conservatively retain all unconditionally in the compiler.
When a comdat is used, the COFF/ELF linkers' GC semantics ensure the
associated parallel array elements are retained or discarded together,
so `llvm.compiler.used` is sufficient.
Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not
used), we have to use `llvm.used` to conservatively make all sections retain by
the linker. This will fix the Windows problem once internal linkage
GlobalObject's in `llvm.used` are retained via `/INCLUDE:`.
Reviewed By: morehouse, vitalybuka
Differential Revision: https://reviews.llvm.org/D97432
DFSan at store does store shadow data; store app data; and at load does
load shadow data; load app data.
When an application data is atomic, one overtainting case is
thread A: load shadow
thread B: store shadow
thread B: store app
thread A: load app
If the application address had been used by other flows, thread A reads
previous shadow, causing overtainting.
The change is similar to MSan's solution.
1) enforce ordering of app load/store
2) load shadow after load app; store shadow before shadow app
3) do not track atomic store by reseting its shadow to be 0.
The last one is to address a case like this.
Thread A: load app
Thread B: store shadow
Thread A: load shadow
Thread B: store app
This approach eliminates overtainting as a trade-off between undertainting
flows via shadow data race.
Note that this change addresses only native atomic instructions, but
does not support builtin libcalls yet.
https://llvm.org/docs/Atomics.html#libcalls-atomic
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97310
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)
If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.
This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)
For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.
This patch does not change the object file output for COFF (where `!associated` is ignored).
Reviewed By: morehouse, rnk, vitalybuka
Differential Revision: https://reviews.llvm.org/D97430