When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed
debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and
in a --threads=8 link "Compress debug sections" takes ~70% time.
This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly.
DEFLATE blocks are a bit sequence. We need to ensure every shard starts
at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards
but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can
be used as well, but Z_FULL_FLUSH clears the hash table which just
wastes time.)
The last block requires the BFINAL flag. We call deflate with Z_FINISH
to set the flag as well as flush the output to a byte boundary. Under
the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a
non-compressed block (called stored block in zlib). RFC1951 says "Any
bits of input up to the next byte boundary are ignored."
In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total
speed is 2.54x. Because the hash table for one shard is not shared with the next
shard, the output is slightly larger. Better compression ratio can be achieved
by preloading the window size from the previous shard as dictionary
(`deflateSetDictionary`), but that is overkill.
```
# 1MiB shards
% bloaty clang.new -- clang.old
FILE SIZE VM SIZE
-------------- --------------
+0.3% +129Ki [ = ] 0 .debug_str
+0.1% +105Ki [ = ] 0 .debug_info
+0.3% +101Ki [ = ] 0 .debug_line
+0.2% +2.66Ki [ = ] 0 .debug_abbrev
+0.0% +1.19Ki [ = ] 0 .debug_ranges
+0.1% +341Ki [ = ] 0 TOTAL
# 2MiB shards
% bloaty clang.new -- clang.old
FILE SIZE VM SIZE
-------------- --------------
+0.2% +74.2Ki [ = ] 0 .debug_line
+0.1% +72.3Ki [ = ] 0 .debug_str
+0.0% +69.9Ki [ = ] 0 .debug_info
+0.1% +976 [ = ] 0 .debug_abbrev
+0.0% +882 [ = ] 0 .debug_ranges
+0.0% +218Ki [ = ] 0 TOTAL
```
Bonus in not using zlib::compress
* we can compress a debug section larger than 4GiB
* peak memory usage is lower because for most shards the output size is less
than 50% input size (all less than 55% for a large binary I tested, but
decreasing the initial output size does not decrease memory usage)
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D117853
<new> and <cstdef> were introduced in aa60b3fd87 but the dependency
is now dead.
As a consequence you may need to include <new> where you use it while it
was auto-included as an implicit dependency before.
The impact on the codebase is small, as <new> is a very small header
(<100 SLOC) but it gets included everywhere, so that somehow counts (?)
According to v-spec 1.0, `vmulh`, `vmulhu`, `vmulhsu` and `vsmul` are
NOT supported for EEW=64 in Zve64*.
This patch tries to guard it correctly.
Authored by: Craig Topper <craig.topper@sifive.com> @craig.topper
Co-Authored by: Eop Chen <eop.chen@sifive.com> @eopXD
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D117913
A semicolon-separated list of the names of functions or methods to be considered as not having side-effects was added for bugprone-assert-side-effect. It can be used to exclude methods like iterator::begin/end from being considered as having side-effects.
Differential Revision: https://reviews.llvm.org/D116478
Disable the Go binding test on AIX because building the binding on AIX with Clang is currently unsupported.
Reviewed By: ZarkoCA
Differential Revision: https://reviews.llvm.org/D117505
In addition to having multiple exit locations, there can be multiple blocks leading to the same exit location, which results in a potential phi node. If we find that multiple blocks within the region branch to the same block outside the region, resulting in a phi node, the code extractor pulls this phi node into the function and uses it as an output.
We make sure that this phi node is given an output slot, and that the two values are removed from the outputs if they are not used anywhere else outside of the region. Across the extracted regions, the phi nodes are combined into a single block for each potential output block, similar to the previous patch.
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106995
I currently have code that is crashing in the second std::advance call,
and it was not straightforward to identify the problem, as the first line
of the stacktrace is in RopePieceBTreeIterator::operator++:
```
*** SIGILL; stack trace: ***
PC: clang/include/clang/Rewrite/Core/RewriteRope.h:119 clang::RopePieceBTreeIterator::operator++()
../include/c++/v1/__iterator/advance.h:35 std::__u::__advance<>()
../include/c++/v1/__iterator/advance.h:65 std::__u::advance<>()
clang/lib/Rewrite/Rewriter.cpp:228 clang::Rewriter::getRewrittenText()
clang/include/clang/Rewrite/Core/Rewriter.h:106 clang::Rewriter::getRewrittenText()
```
Adding an assertion produces a friendlier error message for the caller.
Reviewed By: gribozavr2
Differential Revision: https://reviews.llvm.org/D117579
This is in preparation for moving the code that parses and processes
order files into this file.
See https://reviews.llvm.org/D117354 for context and discussion.
This is part of the implementation of the dataflow analysis framework.
See "[RFC] A dataflow analysis framework for Clang AST" on cfe-dev.
Reviewed-by: xazax.hun
Differential Revision: https://reviews.llvm.org/D118119
Add missing dependency on llvm-jitlink when building compiler-rt with
LLVM_BUILD_EXTERNAL_COMPILER_RT. Previously we would
non-deterministically fail the tests due to the missing binary.
rdar://87247681
Differential Revision: https://reviews.llvm.org/D118087
Based on RLIBM implementation similar to logf and log2f. Most of the exceptional inputs are the exact powers of 10.
Reviewed By: sivachandra, zimmermann6, santoshn, jpl169
Differential Revision: https://reviews.llvm.org/D118093
During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.
Differential Revision: https://reviews.llvm.org/D117040
Using clang::CallGraph to get the called functions.
This makes a better foundation to improve support for
C++ and print the call chain.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D118016
enum FileChangeReason has four possible type EnterFile, ExitFile,
SystemHeaderPragma and RenameFile,
It should pop the back element of Files only if FileChangeReason is ExitFile.
We should always be calculating a byte-wise difference here.
Previously this calculated the pointer difference while taking
the pointer element type into account, which is incorrect.
Addresses are floats when a sampler is present and unsigned integers
when no sampler is present.
Therefore, only zext instructions, not sext instructions should match.
Also match integer constants that can be truncated.
Differential Revision: https://reviews.llvm.org/D118043
Determine the masked load/store access type from the value type
of the intrinsics, rather than the pointer element type. For
cleanliness, include the access type in InterestingMemoryAccess.
since it qualifies as a toolchain tool rather than "internal llvm tool".
This will make it part of builds which set the
LLVM_INSTALL_TOOLCHAIN_ONLY cmake option, such as the Windows installer.
Differential revision: https://reviews.llvm.org/D118042
This is a cleanup of all llvm-qualified-auto findings.
This patch was created by automatically applying the fixes from
clang-tidy.
Differential Revision: https://reviews.llvm.org/D113898
This is mostly a copy of the existing tensor.from_elements bufferization. Once TensorInterfaceImpl.cpp is moved to the tensor dialect, the existing rewrite pattern can be deleted.
Differential Revision: https://reviews.llvm.org/D117775
This is mostly a copy of the existing tensor.generate bufferization. Once TensorInterfaceImpl.cpp is moved to the tensor dialect, the existing rewrite pattern can be deleted.
Differential Revision: https://reviews.llvm.org/D117770
NOTE: This patch also includes tests that highlight those cases
where the existing DAG combine doesn't yet work well for SVE.
Differential Revision: https://reviews.llvm.org/D117873
This is what CreateNonTerminatorUnreachable() in InstCombine uses.
Specific choice here doesn't really matter, but we should pick
one that is pointer element type independent.