We can now use this to decide whether to emit a verneed during the final
pass over the symbols. We were previously wrongly creating a verneed entry
in the case where all references to a DSO's symbols were weak.
In a future change we may also want to use the used bit to control whether
shared symbols are preemptible and appear in the dynsym. This seems a little
tricky to do at the moment because isNeeded() is templated.
The only other functional change here is that we emit a DT_NEEDED for DSOs
whose symbols are all preempted by objects that appear later in the link. But
that doesn't seem too important to me.
Differential Revision: http://reviews.llvm.org/D21171
llvm-svn: 272282
MergedInputSection::getOffset is the busiest function in LLD if string
merging is enabled and input files have lots of mergeable sections.
It is usually the case when creating executable with debug info,
so it is pretty common.
The reason why it is slow is because it has to do faily complex
computations. For non-mergeable sections, section contents are
contiguous in output, so in order to compute an output offset,
we only have to add the output section's base address to an input
offset. But for mergeable strings, section contents are split for
merging, so they are not contigous. We've got to do some lookups.
We used to do binary search on the list of section pieces.
It is slow because I think it's hostile to branch prediction.
This patch replaces it with hash table lookup. Seems it's working
pretty well. Below is "perf stat -r10" output when linking clang
with debug info. In this case this patch speeds up about 4%.
Before:
6584.153205 task-clock (msec) # 1.001 CPUs utilized ( +- 0.09% )
238 context-switches # 0.036 K/sec ( +- 6.59% )
0 cpu-migrations # 0.000 K/sec ( +- 50.92% )
1,067,675 page-faults # 0.162 M/sec ( +- 0.15% )
18,369,931,470 cycles # 2.790 GHz ( +- 0.09% )
9,640,680,143 stalled-cycles-frontend # 52.48% frontend cycles idle ( +- 0.18% )
<not supported> stalled-cycles-backend
21,206,747,787 instructions # 1.15 insns per cycle
# 0.45 stalled cycles per insn ( +- 0.04% )
3,817,398,032 branches # 579.786 M/sec ( +- 0.04% )
132,787,249 branch-misses # 3.48% of all branches ( +- 0.02% )
6.579106511 seconds time elapsed ( +- 0.09% )
After:
6312.317533 task-clock (msec) # 1.001 CPUs utilized ( +- 0.19% )
221 context-switches # 0.035 K/sec ( +- 4.11% )
1 cpu-migrations # 0.000 K/sec ( +- 45.21% )
1,280,775 page-faults # 0.203 M/sec ( +- 0.37% )
17,611,539,150 cycles # 2.790 GHz ( +- 0.19% )
10,285,148,569 stalled-cycles-frontend # 58.40% frontend cycles idle ( +- 0.30% )
<not supported> stalled-cycles-backend
18,794,779,900 instructions # 1.07 insns per cycle
# 0.55 stalled cycles per insn ( +- 0.03% )
3,287,450,865 branches # 520.799 M/sec ( +- 0.03% )
72,259,605 branch-misses # 2.20% of all branches ( +- 0.01% )
6.307411828 seconds time elapsed ( +- 0.19% )
Differential Revision: http://reviews.llvm.org/D20645
llvm-svn: 270999
scanReloc and the functions on which scanReloc depends is in total
more than 600 lines of code. Since scanReloc does not depend on Writer,
it is better to move it into a separate file.
Differential Revision: http://reviews.llvm.org/D20554
llvm-svn: 270606
Previously, we created a .bss section when needed. We had a function
ensureBss() for that purpose. Turned out that was error-prone
because it was easy to forget to call that function before accessing
the .bss section.
This patch always make the BSS section. The section is added to the
output when it's not empty.
llvm-svn: 270527
Copy relocations are relocations to copy data from DSOs to
executable's .bss segment at runtime. It doesn't make sense to
create such relocations for zero-sized symbols.
GNU linkers don't agree with each other. ld rejects such
relocation/symbol pair. gold don't reject that but do not create
copy relocations as well. I took the former approach because
I don't think the latter is what user wants.
llvm-svn: 270525
.eh_frame_hdr assumes that there is only one .eh_frame and
ensures it by assertions. This patch makes .eh_frame a real
singleton object to simplify.
llvm-svn: 270445
This patch adds Size member to SectionPiece so that getRangeAndSize
can just return a SectionPiece instead of a std::pair<SectionPiece *, uint_t>.
Also renamed the function.
llvm-svn: 270346
We were using std::pair to represents pieces of splittable section
contents. It hurt readability because "first" and "second" are not
meaningful. This patch give them names.
One more thing is that piecewise liveness information is stored to
the second element of the pair as a special value of output section
offset. It was confusing, so I defiend a new bit, "Live", in the
new struct.
llvm-svn: 270340
This makes it explicit that each R_RELAX_TLS_* is equivalent to some
other expression.
With this I think we are at a sweet spot for how much is done in
Target.cpp. I did experiment with moving *all* the value math out of it.
It has the advantage that we know the final value in target independent
code, but it gets quite verbose.
llvm-svn: 270277
Lazy binding is quite important for use case like a shared build of
llvm. Also, if someone wants to disable it, it is better done in the
compiler (disable plt generation).
The only reason to keep it is to make it easier to add a new
architecture. But it doesn't really help much as it is possible to start
with non lazy relocation and plt code but still let the generic part
create a dedicated .got.plt and .rela.plt.
llvm-svn: 269982
New names reflect purpose of corresponding GOT entries better.
Both expression types related to entries allocated in the 'local'
part of MIPS GOT. R_MIPS_GOT_LOCAL_PAGE is for entries contain 'page'
addresses. R_MIPS_GOT_LOCAL is for entries contain 'full' address.
llvm-svn: 269597
If you specify the option in the form of --build-id=0x<hexstring>,
that hexstring is set as a build ID. We observed that the feature
is actually in use in some builds, so we want this feature.
llvm-svn: 269495
The Elf_Rela has an explicit addend. It doesn't need the addend to be
written to the section being relocated.
Since relative relocations are very common in the output, this is a
noticeable speedup. The results I got were
chromium
master 4.778149487
patch 4.761120792 0.996436131802
chromium fast
master 1.896253636
patch 1.840990582 0.970856718241
the gold plugin
master 0.399337811
patch 0.392279276 0.982324401032
clang
master 0.666873675
patch 0.665895708 0.998533504865
llvm-as
master 0.037101095
patch 0.037123149 1.00059442989
the gold plugin fsds
master 0.422473396
patch 0.414192879 0.980399909016
clang fsds
master 0.747302008
patch 0.744843964 0.996710775599
llvm-as fsds
master 0.033146245
patch 0.033064531 0.997534743377
scylla
master 4.08857525
patch 4.082245184 0.998451767275
llvm-svn: 269417
Just do not allow to link shared library if there are
undefined symbols.
This fixes PR27447
Differential revision: http://reviews.llvm.org/D20169
llvm-svn: 269183
This is the option which sorts relocs to optimize dynamic linker performance.
-z combelocs is the default in gold, also it ignores -z nocombreloc,
this patch do the same.
Patch sorts relocations by symbols only and do not create any
DT_REL[A]COUNT entries. That is different with what gold/bfd do.
More information about option is here:
http://www.airs.com/blog/archives/186http://people.redhat.com/jakub/prelink.pdf, p.2
Differential revision: http://reviews.llvm.org/D19528
llvm-svn: 269066
In case of MIPS ABI relocation has R_GOTREL expression's type iif the
relocation type is either R_MIPS_GPREL16 or R_MIPS_GPREL32. So it is
enough to check expression's type only.
llvm-svn: 268741
We were creating the copy relocations just fine, but then thinking that
the .bss position could be preempted and creating a dynamic relocation
to it, which would crash at runtime since that memory is read only.
llvm-svn: 268668
This allows the combined LTO object to provide a definition with the same
name as a symbol that was internalized without causing a duplicate symbol
error. This normally happens during parallel codegen which externalizes
originally-internal symbols, for example.
In order to make this work, I needed to relax the undefined symbol error to
only report an error for symbols that are used in regular objects.
Differential Revision: http://reviews.llvm.org/D19954
llvm-svn: 268649
We were previously using an output offset of -1 for both GC'd and tail
merged pieces. We need to distinguish these two cases in order to filter
GC'd symbols from the symbol table -- we were previously asserting when we
asked for the VA of a symbol pointing into a dead piece, which would end
up asking the tail merging string table for an offset even though we hadn't
initialized it properly.
This patch fixes the bug by using an offset of -1 to exclusively mean GC'd
pieces, using 0 for tail merges, and distinguishing the tail merge case from
an offset of 0 by asking the output section whether it is tail merge.
Differential Revision: http://reviews.llvm.org/D19953
llvm-svn: 268604
We were already checking for non relative relocations.
If we ever decide to add support for rw text segments this means we will
have a single spot to add the flag.
llvm-svn: 268558
MIPS N64 ABI introduces .MIPS.options section which specifies miscellaneous
options to be applied to an object/shared/executable file. LLVM as well as
modern versions of GNU tools read and write the only type of the options -
ODK_REGINFO. It is exact copy of .reginfo section used by O32 ABI.
llvm-svn: 268485
Weak undefined symbols resolve to the image base. This is a little strange,
but it allows us to link function calls to such symbols. Normally such a
call will be guarded with a comparison, which will load a zero from the GOT.
There's one example of such a function call in crti.o in Linux's CRT.
As part of this change, I also needed to make the synthetic start and end
symbols image base relative in the case where their sections were empty,
so that PC-relative references to those symbols would continue to work.
Differential Revision: http://reviews.llvm.org/D19844
llvm-svn: 268350
This change simplifies the BuildId classes by removing a few member
functions and variables from them. It should also make it easy to
parallelize hash computation in future because now each BuildId object
see all inputs rather than one at a time.
llvm-svn: 268333