forked from OSchip/llvm-project
a1532ed275
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time constant. On ELF, when linking a shared object, this requires that `__profc_` is either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that `.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation. ``` # ELF: R_X86_64_PC64 (PC-relative) .quad .L__profc_foo-.L__profd_foo # Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR .quad l___profc_foo-l___profd_foo # COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so # the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo # As compensation, we truncate CountersDelta in the header so that # __llvm_profile_merge_from_buffer and llvm-profdata reader keep working. .quad .L__profc_foo-.L__profd_foo ``` (Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer has `.lprfd` before `.lprfc`, so we cannot work around by reordering `.lprfc` and `.lprfd`.) With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) `ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations. ``` % readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}' R_X86_64_JUMP_SLO 331 R_X86_64_TPOFF64 2 R_X86_64_RELATIVE 476059 # was: 607712 R_X86_64_64 2616 R_X86_64_GLOB_DAT 31 ``` The absolute function address (used by llvm-profdata to collect indirect call targets) can be converted to relative as well, but is not done in this patch. Differential Revision: https://reviews.llvm.org/D104556 |
||
---|---|---|
.. | ||
Inputs | ||
X86 | ||
PR28219.ll | ||
PR41279.ll | ||
PR41279_2.ll | ||
bfi_verification.ll | ||
branch1.ll | ||
branch2.ll | ||
callbr.ll | ||
chr.ll | ||
comdat_internal.ll | ||
comdat_rename.ll | ||
consecutive-zeros.ll | ||
counter_promo.ll | ||
counter_promo_exit_catchswitch.ll | ||
counter_promo_exit_merge.ll | ||
counter_promo_mexits.ll | ||
counter_promo_nest-inseltpoison.ll | ||
counter_promo_nest.ll | ||
criticaledge.ll | ||
cspgo_profile_summary.ll | ||
diag_FE_profile.ll | ||
diag_mismatch.ll | ||
diag_no_funcprofdata.ll | ||
diag_no_profile.ll | ||
diag_no_value_sites.ll | ||
do-not-instrument.ll | ||
fix_bfi.ll | ||
fix_entry_count.ll | ||
func_entry.ll | ||
hash_mismatch_metadata.ll | ||
icp_covariant_call_return.ll | ||
icp_covariant_invoke_return.ll | ||
icp_invoke.ll | ||
icp_invoke_nouse.ll | ||
icp_mismatch_msg.ll | ||
icp_sample.ll | ||
icp_vararg.ll | ||
icp_vararg_sret.ll | ||
indirect_call_annotation.ll | ||
indirect_call_profile.ll | ||
indirect_call_profile_funclet.ll | ||
indirect_call_promotion.ll | ||
indirect_call_promotion_byval.ll | ||
indirect_call_promotion_musttail.ll | ||
indirect_call_promotion_unique.ll | ||
indirect_call_promotion_vla.ll | ||
indirectbr.ll | ||
infinite_loop.ll | ||
infinite_loop_gen.ll | ||
instr_entry_bb.ll | ||
irreducible.ll | ||
landingpad.ll | ||
large_count_remarks.ll | ||
loop1.ll | ||
loop2.ll | ||
memcpy.ll | ||
memop_clone.ll | ||
memop_hash.ll | ||
memop_profile_funclet.ll | ||
memop_size_annotation.ll | ||
memop_size_from_strlen.ll | ||
memop_size_opt.ll | ||
memop_size_opt_skip_ranges_promote_three.ll | ||
memop_size_opt_zero.ll | ||
multiple_hash_profile.ll | ||
noprofile.ll | ||
noreturncall.ll | ||
not_promote_ret_exit.ll | ||
preinline.ll | ||
remap.ll | ||
select1.ll | ||
select2.ll | ||
select_hash_conflict.ll | ||
single_bb.ll | ||
split-indirectbr-critical-edges.ll | ||
statics_counter_naming.ll | ||
suppl-profile.ll | ||
switch.ll | ||
thinlto_cspgo_gen.ll | ||
thinlto_cspgo_use.ll | ||
thinlto_indirect_call_promotion.ll | ||
thinlto_samplepgo_icp.ll | ||
thinlto_samplepgo_icp2.ll | ||
thinlto_samplepgo_icp3.ll | ||
thinlto_samplepgo_icp_droppeddead.ll | ||
unreachable_bb.ll |