This change extends the current logic for inferring readonly and readnone argument attributes to also infer writeonly.
This change is deliberately minimal; there's a couple of areas for follow up.
* I left out all call handling and thus any benefit from the SCC walk. When examining the test changes, I realized the existing code is imprecise, and am going to fix that in it's own revision before adding in the writeonly handling. (Mostly because updating the tests is hard when I, the human, can't figure out whether the result is correct.)
* I left out handling for storing a value (as opposed to storing to a pointer). This should benefit readonly/readnone as well, and applies to a bunch of other instructions. Seemed worth having as a separate review.
Differential Revision: https://reviews.llvm.org/D114963
Need to postpone analysis for addressable lvalue in a depend clause with
iterators, otherwise the incorrect error message is emitted.
Differential Revision: https://reviews.llvm.org/D114653
This is a continuation of D109860 and D109903.
An important challenge for profile inference is caused by the fact that the
sample profile is collected on a fully optimized binary, while the block and
edge frequencies are consumed on an early stage of the compilation that operates
with a non-optimized IR. As a result, some of the basic blocks may not have
associated sample counts, and it is up to the algorithm to deduce missing
frequencies. The problem is illustrated in the figure where three basic
blocks are not present in the optimized binary and hence, receive no samples
during profiling.
We found that it is beneficial to treat all such blocks equally. Otherwise the
compiler may decide that some blocks are “cold” and apply undesirable
optimizations (e.g., hot-cold splitting) regressing the performance. Therefore,
we want to distribute the counts evenly along the blocks with missing samples.
This is achieved by a post-processing step that identifies "dangling" subgraphs
consisting of basic blocks with no sampled counts; once the subgraphs are
found, we rebalance the flow so as every branch probability is 50:50 within the
subgraphs.
Our experiments indicate up to 1% performance win using the optimization on
some binaries and a significant improvement in the quality of profile counts
(when compared to ground-truth instrumentation-based counts)
{F19093045}
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D109980
The goal is to identify the bot and try to fix it.
SetSoftRssLimitExceededCallback is AsanInitInternal as I assume
that only MaybeStartBackgroudThread needs to be delayed to constructors.
Later I want to move MaybeStartBackgroudThread call into sanitizer_common.
If it needs to be reverted please provide to more info, like bot, or details about setup.
Reviewed By: kstoimenov
Differential Revision: https://reviews.llvm.org/D114934
Seems better to rely on the existing formatting, makes the output
smaller/simpler - this is consistent with libstdc++'s std::string_view
pretty printing too.
Differential Revision: https://reviews.llvm.org/D113244
During the llvm round table it was generally agreed that the newer macho
lld implementation is feature complete enough to replace the old
implementation entirely. This will reduce confusion for new users who
aren't aware of the history.
Differential Revision: https://reviews.llvm.org/D114842
This is a continuation of D109860.
Traditional flow-based algorithms cannot guarantee that the resulting edge
frequencies correspond to a *connected* flow in the control-flow graph. For
example, for an instance in the attached figure, a flow-based (or any other)
inference algorithm may produce an output in which the hot loop is disconnected
from the entry block (refer to the rightmost graph in the figure). Furthermore,
creating a connected minimum-cost maximum flow is a computationally NP-hard
problem. Hence, we apply a post-processing adjustments to the computed flow
by connecting all isolated flow components ("islands").
This feature helps to keep all blocks with sample counts connected and results
in significant performance wins for some binaries.
{F19077343}
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D109903
If the extractelement instruction is used multiple times in the
different tree entries (either vectorized, or gathered), need to
compensate the scalar cost of such instructions. They are completely
removed if all users are part of the tree but we need to compensate the
cost only once for each instruction.
Differential Revision: https://reviews.llvm.org/D114958
Compress by factor 4x, takes about 10ms per 8 MiB block.
Depends on D114498.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D114503
It should be NFC, as they already intercept pthread_create.
This will let us to fix BackgroundThread for these sanitizerts.
In in followup patches I will fix MaybeStartBackgroudThread for them
and corresponding tests.
Reviewed By: kstoimenov
Differential Revision: https://reviews.llvm.org/D114935
These tests work fine with VS2017, but become more flaky with VS2019 and the buildbot is about to get upgraded.
Differential Revision: https://reviews.llvm.org/D114907
They aren't needed anymore, we handle conditional compilation in those
files.
Reviewed By: GMNGeoffrey
Differential Revision: https://reviews.llvm.org/D114970
The new runtime is currently broken for AMD offloading. This patch makes
the default the old runtime only for the AMD target.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D114965
Need to outline the code for finding common vectors in insertelement
instructions into a separate function for future patches. It also
improves the process by adding some extra checks for early exit and
fixes a bug where it always finds the match because of erroneous compare
of the same values.
Differential Revision: https://reviews.llvm.org/D114909
Some instructions with i8 immediate ranges can only hold negative values
(like t2LDRHi8), only hold positive values (like t2STRT) or hold +/-
depending on the U bit (like the pre/post inc instructions. e.g
t2LDRH_POST). This patch splits the AddrModeT2_i8 into AddrModeT2_i8,
AddrModeT2_i8pos and AddrModeT2_i8neg to make this clear.
This allows us to get the offset ranges of t2LDRHi8 correct in the
load/store optimizer, fixing issues where we could end up creating
instructions with positive offsets (which may then be encoded as ldrht).
Differential Revision: https://reviews.llvm.org/D114638
This reverts commit 8cd61aac00.
I had missed one place in a test that needed updating; it passed on my
dirty build tree but not on a clean one.
Original commit message:
D114478 identified testing gaps; this patch fills them.
Differential Revision: https://reviews.llvm.org/D114913
We call UnmapShadow before the actual munmap, at that point we don't yet
know if the provided address/size are sane. We can't call UnmapShadow
after the actual munmap becuase at that point the memory range can
already be reused for something else, so we can't rely on the munmap
return value to understand is the values are sane.
While calling munmap with insane values (non-canonical address, negative
size, etc) is an error, the kernel won't crash. We must also try to not
crash as the failure mode is very confusing (paging fault inside of the
runtime on some derived shadow address).
Such invalid arguments are observed on Chromium tests:
https://bugs.chromium.org/p/chromium/issues/detail?id=1275581
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114944
If several shuffle instructions are emitted, some of them might
same/compatible (less defined) with the previously emitted ones. Such
shuffles can be removed safely, improving the total cost of the
vectorized code.
Differential Revision: https://reviews.llvm.org/D114087
The added test demonstrates loading a dynamic library with static TLS.
Such static TLS is a hack that allows a dynamic library to have faster TLS,
but it can be loaded only iff all threads happened to allocate some excess
of static TLS space for whatever reason. If it's not the case loading fails with:
dlopen: cannot load any more object with static TLS
We used to produce a false positive because dlopen will write into TLS
of all existing threads to initialize/zero TLS region for the loaded library.
And this appears to be racing with initialization of TLS in the thread
since we model a write into the whole static TLS region (we don't what part
of it is currently unused):
WARNING: ThreadSanitizer: data race (pid=2317365)
Write of size 1 at 0x7f1fa9bfcdd7 by main thread:
0 memset
1 init_one_static_tls
2 __pthread_init_static_tls
[[ this is where main calls dlopen ]]
3 main
Previous write of size 8 at 0x7f1fa9bfcdd0 by thread T1:
0 __tsan_tls_initialization
Fix this by ignoring accesses during dlopen.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D114953
This is very similar to https://reviews.llvm.org/D103557 but applies to
symbols which are undefined at link time rather than compile time.
We already have code that handles symbols which were defined at link
time but dead stripped by `--gc-sections` (See
`test/wasm/debug-removed-fn.ll`). In that case the symbols are not live
(!isLive()). However, we can also have live symbols (which are
references by the program) but which are undefined at link time and are
imported by the linker.
In the test case here the symbol `undef` is used but is not defined
in the program but is imported by the linker due to the
`--import-undefined` flag.
Fixes: https://github.com/emscripten-core/emscripten/issues/15528
Differential Revision: https://reviews.llvm.org/D114921
If clang's output is set to bitcode and LTO is enabled, clang would
unconditionally add the flag to the module. Unfortunately, if the input were a
bitcode or IR file and had the flag set, this would result in two copies of the
flag, which is illegal IR. Guard the setting of the flag by checking whether it
already exists. This follows existing practice for the related "ThinLTO" module
flag.
Differential Revision: https://reviews.llvm.org/D112177
This patch changes the `-fopenmp-target-new-runtime` option which controls if
the new or old device runtime is used to be true by default. Disabling this to
use the old runtime now requires using `-fno-openmp-target-new-runtime`.
Reviewed By: JonChesterfield, tianshilei1992, gregrodgers, ronlieb
Differential Revision: https://reviews.llvm.org/D114890