2015-07-25 05:03:07 +08:00
|
|
|
set(LLVM_TARGET_DEFINITIONS Options.td)
|
|
|
|
tablegen(LLVM Options.inc -gen-opt-parser-defs)
|
|
|
|
add_public_tablegen_target(ELFOptionsTableGen)
|
|
|
|
|
[ELF] Parallelize --compress-debug-sections=zlib
When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed
debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and
in a --threads=8 link "Compress debug sections" takes ~70% time.
This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly.
DEFLATE blocks are a bit sequence. We need to ensure every shard starts
at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards
but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can
be used as well, but Z_FULL_FLUSH clears the hash table which just
wastes time.)
The last block requires the BFINAL flag. We call deflate with Z_FINISH
to set the flag as well as flush the output to a byte boundary. Under
the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a
non-compressed block (called stored block in zlib). RFC1951 says "Any
bits of input up to the next byte boundary are ignored."
In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total
speed is 2.54x. Because the hash table for one shard is not shared with the next
shard, the output is slightly larger. Better compression ratio can be achieved
by preloading the window size from the previous shard as dictionary
(`deflateSetDictionary`), but that is overkill.
```
# 1MiB shards
% bloaty clang.new -- clang.old
FILE SIZE VM SIZE
-------------- --------------
+0.3% +129Ki [ = ] 0 .debug_str
+0.1% +105Ki [ = ] 0 .debug_info
+0.3% +101Ki [ = ] 0 .debug_line
+0.2% +2.66Ki [ = ] 0 .debug_abbrev
+0.0% +1.19Ki [ = ] 0 .debug_ranges
+0.1% +341Ki [ = ] 0 TOTAL
# 2MiB shards
% bloaty clang.new -- clang.old
FILE SIZE VM SIZE
-------------- --------------
+0.2% +74.2Ki [ = ] 0 .debug_line
+0.1% +72.3Ki [ = ] 0 .debug_str
+0.0% +69.9Ki [ = ] 0 .debug_info
+0.1% +976 [ = ] 0 .debug_abbrev
+0.0% +882 [ = ] 0 .debug_ranges
+0.0% +218Ki [ = ] 0 TOTAL
```
Bonus in not using zlib::compress
* we can compress a debug section larger than 4GiB
* peak memory usage is lower because for most shards the output size is less
than 50% input size (all less than 55% for a large binary I tested, but
decreasing the initial output size does not decrease memory usage)
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D117853
2022-01-26 02:29:04 +08:00
|
|
|
if(LLVM_ENABLE_ZLIB)
|
|
|
|
set(imported_libs ZLIB::ZLIB)
|
|
|
|
endif()
|
|
|
|
|
2016-02-28 08:25:54 +08:00
|
|
|
add_lld_library(lldELF
|
2017-12-05 23:59:05 +08:00
|
|
|
AArch64ErrataFix.cpp
|
2017-06-17 01:32:43 +08:00
|
|
|
Arch/AArch64.cpp
|
|
|
|
Arch/AMDGPU.cpp
|
|
|
|
Arch/ARM.cpp
|
|
|
|
Arch/AVR.cpp
|
2018-06-14 02:45:25 +08:00
|
|
|
Arch/Hexagon.cpp
|
2017-06-17 01:32:43 +08:00
|
|
|
Arch/Mips.cpp
|
2017-06-20 05:03:57 +08:00
|
|
|
Arch/MipsArchTree.cpp
|
2019-01-10 21:43:06 +08:00
|
|
|
Arch/MSP430.cpp
|
2017-06-17 01:32:43 +08:00
|
|
|
Arch/PPC.cpp
|
|
|
|
Arch/PPC64.cpp
|
2018-08-10 01:59:56 +08:00
|
|
|
Arch/RISCV.cpp
|
2017-06-29 01:05:39 +08:00
|
|
|
Arch/SPARCV9.cpp
|
2017-06-17 01:32:43 +08:00
|
|
|
Arch/X86.cpp
|
|
|
|
Arch/X86_64.cpp
|
2019-09-16 17:38:38 +08:00
|
|
|
ARMErrataFix.cpp
|
2018-04-18 07:30:05 +08:00
|
|
|
CallGraphSort.cpp
|
2018-09-15 07:51:05 +08:00
|
|
|
DWARF.cpp
|
2015-07-25 05:03:07 +08:00
|
|
|
Driver.cpp
|
|
|
|
DriverUtils.cpp
|
2016-05-24 10:55:45 +08:00
|
|
|
EhFrame.cpp
|
2016-02-26 02:43:51 +08:00
|
|
|
ICF.cpp
|
2015-07-25 05:03:07 +08:00
|
|
|
InputFiles.cpp
|
2015-09-22 08:01:39 +08:00
|
|
|
InputSection.cpp
|
2016-03-23 04:52:10 +08:00
|
|
|
LTO.cpp
|
2015-10-01 01:23:26 +08:00
|
|
|
LinkerScript.cpp
|
2017-01-14 05:05:46 +08:00
|
|
|
MapFile.cpp
|
ELF2: Implement --gc-sections.
Section garbage collection is a feature to remove unused sections
from outputs. Unused sections are sections that cannot be reachable
from known GC-root symbols or sections. Naturally the feature is
implemented as a mark-sweep garbage collector.
In this patch, I added Live bit to InputSectionBase. If and only
if Live bit is on, the section will be written to the output.
Starting from GC-root symbols or sections, a new function, markLive(),
visits all reachable sections and sets their Live bits. Writer then
ignores sections whose Live bit is off, so that such sections are
excluded from the output.
This change has small negative impact on performance if you use
the feature because making sections means more work. The time to
link Clang changes from 0.356s to 0.386s, or +8%.
It reduces Clang size from 57,764,984 bytes to 55,296,600 bytes.
That is 4.3% reduction.
http://reviews.llvm.org/D13950
llvm-svn: 251043
2015-10-23 02:49:53 +08:00
|
|
|
MarkLive.cpp
|
2015-09-22 05:38:08 +08:00
|
|
|
OutputSections.cpp
|
2016-05-25 04:24:43 +08:00
|
|
|
Relocations.cpp
|
2017-02-14 12:47:05 +08:00
|
|
|
ScriptLexer.cpp
|
2017-04-05 13:07:39 +08:00
|
|
|
ScriptParser.cpp
|
2015-07-25 05:03:07 +08:00
|
|
|
SymbolTable.cpp
|
|
|
|
Symbols.cpp
|
2016-11-02 04:28:21 +08:00
|
|
|
SyntheticSections.cpp
|
2015-09-23 02:19:46 +08:00
|
|
|
Target.cpp
|
2016-07-09 00:10:27 +08:00
|
|
|
Thunks.cpp
|
2015-07-25 05:03:07 +08:00
|
|
|
Writer.cpp
|
|
|
|
|
|
|
|
LINK_COMPONENTS
|
2016-03-01 23:56:53 +08:00
|
|
|
${LLVM_TARGETS_TO_BUILD}
|
2017-06-08 02:06:11 +08:00
|
|
|
BinaryFormat
|
2018-05-03 18:03:45 +08:00
|
|
|
BitWriter
|
2016-03-01 23:56:53 +08:00
|
|
|
Core
|
2016-10-21 15:46:24 +08:00
|
|
|
DebugInfoDWARF
|
2019-10-30 06:28:19 +08:00
|
|
|
Demangle
|
2016-06-23 02:09:23 +08:00
|
|
|
LTO
|
2017-10-12 07:18:43 +08:00
|
|
|
MC
|
2015-07-25 05:03:07 +08:00
|
|
|
Object
|
|
|
|
Option
|
2020-01-10 12:58:31 +08:00
|
|
|
Passes
|
2015-07-25 05:03:07 +08:00
|
|
|
Support
|
2016-02-29 03:50:14 +08:00
|
|
|
|
|
|
|
LINK_LIBS
|
2017-10-03 05:00:41 +08:00
|
|
|
lldCommon
|
[ELF] Parallelize --compress-debug-sections=zlib
When linking a Debug build clang (265MiB SHF_ALLOC sections, 920MiB uncompressed
debug info), in a --threads=1 link "Compress debug sections" takes 2/3 time and
in a --threads=8 link "Compress debug sections" takes ~70% time.
This patch splits a section into 1MiB shards and calls zlib `deflake` parallelly.
DEFLATE blocks are a bit sequence. We need to ensure every shard starts
at a byte boundary for concatenation. We use Z_SYNC_FLUSH for all shards
but the last to flush the output to a byte boundary. (Z_FULL_FLUSH can
be used as well, but Z_FULL_FLUSH clears the hash table which just
wastes time.)
The last block requires the BFINAL flag. We call deflate with Z_FINISH
to set the flag as well as flush the output to a byte boundary. Under
the hood, all of Z_SYNC_FLUSH, Z_FULL_FLUSH, and Z_FINISH emit a
non-compressed block (called stored block in zlib). RFC1951 says "Any
bits of input up to the next byte boundary are ignored."
In a --threads=8 link, "Compress debug sections" is 5.7x as fast and the total
speed is 2.54x. Because the hash table for one shard is not shared with the next
shard, the output is slightly larger. Better compression ratio can be achieved
by preloading the window size from the previous shard as dictionary
(`deflateSetDictionary`), but that is overkill.
```
# 1MiB shards
% bloaty clang.new -- clang.old
FILE SIZE VM SIZE
-------------- --------------
+0.3% +129Ki [ = ] 0 .debug_str
+0.1% +105Ki [ = ] 0 .debug_info
+0.3% +101Ki [ = ] 0 .debug_line
+0.2% +2.66Ki [ = ] 0 .debug_abbrev
+0.0% +1.19Ki [ = ] 0 .debug_ranges
+0.1% +341Ki [ = ] 0 TOTAL
# 2MiB shards
% bloaty clang.new -- clang.old
FILE SIZE VM SIZE
-------------- --------------
+0.2% +74.2Ki [ = ] 0 .debug_line
+0.1% +72.3Ki [ = ] 0 .debug_str
+0.0% +69.9Ki [ = ] 0 .debug_info
+0.1% +976 [ = ] 0 .debug_abbrev
+0.0% +882 [ = ] 0 .debug_ranges
+0.0% +218Ki [ = ] 0 TOTAL
```
Bonus in not using zlib::compress
* we can compress a debug section larger than 4GiB
* peak memory usage is lower because for most shards the output size is less
than 50% input size (all less than 55% for a large binary I tested, but
decreasing the initial output size does not decrease memory usage)
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D117853
2022-01-26 02:29:04 +08:00
|
|
|
${imported_libs}
|
2017-02-10 09:59:20 +08:00
|
|
|
${LLVM_PTHREAD_LIB}
|
2015-07-25 05:03:07 +08:00
|
|
|
|
2016-11-17 12:36:35 +08:00
|
|
|
DEPENDS
|
|
|
|
ELFOptionsTableGen
|
2020-07-18 07:43:05 +08:00
|
|
|
intrinsics_gen
|
2016-11-17 12:36:35 +08:00
|
|
|
)
|