Now that getSectionPiece is fast (uses a hash) it is probably OK to
split merge sections early.
The reason I want to do this is to split eh_frame sections in the same
place.
This does mean that we have to decompress early. Given that the only
compressed sections are debug info, I don't think we are missing much.
It is a small improvement: 0.5% on the geometric mean.
llvm-svn: 331058
PPC64 V2 ABI describes two entry points to a function. The global entry point
sets up the TOC base pointer. When calling a local function, the call should
branch to the local entry point rather than the global entry point.
Section 3.4.1 describes using the 3 most significant bits of the st_other
field to find out how many instructions there are between the local and global
entry point. This patch adds the correct offset required to branch to the local
entry point of a function.
Differential Revision: https://reviews.llvm.org/D45729
llvm-svn: 331046
The PPC64 V2 ABI restores the toc base by loading from an offset of 24 from r1.
This patch fixes the offset and updates the testcases from V1 to V2. It also
issues an error when a nop is missing after a call to an external function.
Differential Revision: https://reviews.llvm.org/D45892
llvm-svn: 330600
Now that we don't ICF synthetic sections, we can go back to the old
logic on whose responsibility it is to check Repl.
The idea is that Sec->something() will not check Repl. It is the
responsibility of the caller to find the correct Sec.
llvm-svn: 330346
We had a single symbol using -1 with a synthetic section. It is
simpler to just update its value.
This is not a big will by itself, but will allow having a simple
getOffset for InputSeciton.
llvm-svn: 330340
Using getOffset is here was a bit of an overkill. This is being
written and has relocations. This implies it is a .eh_frame or regular
section.
llvm-svn: 330307
This is similar to r329219, but for the entire section. Like r329219 I
don't expect this to have any real impact, it is just more consistent
and simpler.
llvm-svn: 329367
We were ignoring the addend if the piece was dead. I don't expect this
to make a difference in any real world situations, but it is simpler
anyway.
llvm-svn: 329219
In the lld perf builder r328686 had a negative impact in
stalled-cycles-frontend. Somehow that stat is not showing on my
machine, but the attached patch shows an improvement on cache-misses,
which is probably a reasonable proxy.
My working theory is that given a large input the pieces vector is out
of cache by the time initOffsetMap runs.
Both finalizeContents implementation have a convenient location for
initializing the OffsetMap, so this seems the best solution.
llvm-svn: 329117
OffsetMap maps to a SectionPiece index, but we were not taking
advantage of that in getSectionPiece.
With this patch both getOffset and getSectionPiece use OffsetMap and
the binary search is moved to findSectionPiece.
llvm-svn: 329044
Since SectionBase::getOutputSection handles ICF replaces and
SectionBase::getOffset was handling it in some cases, it is more
consistent to have getOffset always handle it.
llvm-svn: 328391
When looking for the output section and the output offset the
expectation was that the caller had looked at Repl. That works fine
for InputSections, but in the case of MergeInputSections the caller
doesn't have the section that is actually replaced.
The original testcase was failing because getOutputSection was
returning null. The slightly extended testcase also checks that
getOffset also checks Repl.
I will send a refactoring separetelly.
llvm-svn: 328332
Our code assumes all input sections in an output SHF_LINK_ORDER
section has SHF_LINK_ORDER flag. We do not check that and that can cause a crash.
That happens because we call
std::stable_sort(Sections.begin(), Sections.end(), compareByFilePosition);,
where compareByFilePosition predicate does not expect to see
null when calls getLinkOrderDep.
The same might happen when sections refer to non-regular sections.
Test cases demonstrate the issues, patch fixes them.
Differential revision: https://reviews.llvm.org/D44193
llvm-svn: 327006
Previously we would crash because did not mark .rel[a] sections
as dead and they tried to access parent which was not live
after ICF and therefore was null.
Differential revision: https://reviews.llvm.org/D43241
llvm-svn: 325877
Summary:
This follows up on r321889 where writing of Elf_Rel addends was partially
moved to RelocationBaseSection. This patch ensures that the addends are
always written to the output section when a input section uses RELA but the
output is REL.
Differential Revision: https://reviews.llvm.org/D42843
llvm-svn: 325328
Even though it doesn't make sense, there seems to be multiple programs
in the wild that create PC-relative relocations in non-ALLOC sections.
I believe this is caused by the negligence of GNU linkers to not report
any errors for such relocations.
Currently, lld emits warnings against such relocations and exits.
So, you cannot link any program that contains wrong relocations until
you fix an issue in a program that generates wrong ELF files. It's often
impractical to fix a program because it's not always easy.
This patch relaxes the error checking and emit a warning instead.
Differential Revision: https://reviews.llvm.org/D43351
llvm-svn: 325307
In order to identify a compressed section, we check if a section name
starts with ".zdebug" or the section has SHF_COMPRESSED flag. We already
use the knowledge in this function. So hiding that check in
isCompressedELFSection doesn't make sense.
llvm-svn: 324951
When decompressing a compressed debug section, we drop SHF_COMPRESSED
flag but we didn't drop "z" in ".zdebug" section name. This patch does
that for consistency.
This change also fixes the issue that .zdebug_gnu_pubnames are not
dropped when we are creating a .gdb_index section.
llvm-svn: 324949
Initially LLD generates Elf_Rel relocations for O32 ABI and Elf_Rela
relocations for N32 / N64 ABIs. In other words, format of input and
output relocations was always the same. Now LLD generates all output
relocations using Elf_Rel format only. It conforms to ABIs requirement.
The patch suggested by Alexander Richardson.
llvm-svn: 324064
We normally avoid "switch (Config->EKind)", but in this case I think
it is worth it.
It is only executed when there is an error and it allows detemplating
a lot of code.
llvm-svn: 321404
It is currently in InputSectionBase. Only InputSections are used in
ICF, so Repl should be move to InputSection to clear the class
hierarchy or, like this patch does, to SectionBase for convenience.
The convenience of having it on the base class is that we can just
access the replacement without having to first check if it is an
InputSection. It is a bit less code and a bit faster as some of this
code is very hot.
I got up to 1.77% improvement in clang-gdb-index and no regressions
according to lnt.
llvm-svn: 320654
Having a SectionBase method check Repl is inconsistent with how we
handle other section information.
For example, if a section is replaced, Sec->Live is false and it is
natural for Sec->getOutputSection() to be null.
It is the symbol that is moved to the replacement section.
llvm-svn: 320599
Since MarkLive.cpp is the place where we set Live flags for
other sections, it looks correct to do that there.
Benefit is that we stop spreading GC logic outsize of MarkLive.cpp.
Differential revision: https://reviews.llvm.org/D40454
llvm-svn: 319435
Previously our relocations we rewrote were broken for that case.
We emited incorrect addend and broken relocation info field
because did not produce section symbol for mergeable synthetic sections.
Differential revision: https://reviews.llvm.org/D40070
llvm-svn: 318394
Now that DefinedRegular is the only remaining derived class of
Defined, we can merge the two classes.
Differential Revision: https://reviews.llvm.org/D39667
llvm-svn: 317448
Now that we have only SymbolBody as the symbol class. So, "SymbolBody"
is a bit strange name now. This is a mechanical change generated by
perl -i -pe s/SymbolBody/Symbol/g $(git grep -l SymbolBody lld/ELF lld/COFF)
nd clang-format-diff.
Differential Revision: https://reviews.llvm.org/D39459
llvm-svn: 317370
This is PR34826.
Currently LLD is unable to report line number when reporting
duplicate declaration of some variable.
That happens because for extracting line information we always use
.debug_line section content which describes mapping from machine
instructions to source file locations, what does not help for
variables as does not describe them.
In this patch I am taking the approproate information about
variables locations from the .debug_info section.
Differential revision: https://reviews.llvm.org/D38721
llvm-svn: 317080
This is for PR34852.
GCC 8.0 or earlier have a bug that it emits R_386_GOTPC relocations
against _GLOBAL_OFFSET_TABLE for .debug_info. The bug seems to have
been fixed in 2017: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82630,
but we do not want LLD to report errors for such inputs.
In this patch we ignore such relocations.
Differential revision: https://reviews.llvm.org/D38625
llvm-svn: 316761
Summary:
The COFF linker and the ELF linker have long had similar but separate
Error.h and Error.cpp files to implement error handling. This change
introduces new error handling code in Common/ErrorHandler.h, changes the
COFF and ELF linkers to use it, and removes the old, separate
implementations.
Reviewers: ruiu
Reviewed By: ruiu
Subscribers: smeenai, jyknight, emaste, sdardis, nemanjai, nhaehnle, mgorny, javed.absar, kbarton, fedor.sergeev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39259
llvm-svn: 316624
We used to have a map from section piece offsets to section pieces
as a cache for binary search. But I found that the map took quite a
large amount of memory and didn't make linking faster. So, in this
patch, I removed the map.
This patch saves 566 MiB of RAM (2.019 GiB -> 1.453 GiB) when linking
clang with debug info, and the link time is 4% faster in that test case.
Thanks for Sean Silva for pointing this out.
llvm-svn: 316305
By assuming that mergeable input sections are smaller than 4 GiB,
lld's memory usage when linking clang with debug info drops from
2.788 GiB to 2.019 GiB (measured by valgrind, and that does not include
memory space for mmap'ed files). I think that's a reasonable assumption
given such a large RAM savings, so this patch.
According to valgrind, gold needs 3.54 GiB of RAM to do the same thing.
NB: This patch does not introduce a limitation on the size of
output sections. You can still create sections larger than 4 GiB.
llvm-svn: 316280
A section was passed to getRelExpr just to create an error message.
But if there's an invalid relocation, we would eventually report it
in relocateOne. So we don't have to pass a section to getRelExpr.
llvm-svn: 315552
We were using uint32_t as the type of relocation kind. It has a
readability issue because what Type really means in `uint32_t Type`
is not obvious. It could be a section type, a symbol type or a
relocation type.
Since we do not do any arithemetic operations on relocation types
(e.g. adding one to R_X86_64_PC32 doesn't make sense), it would be
more natural if they are represented as enums. Unfortunately, that
is not doable because relocation type definitions are spread into
multiple header files.
So I decided to use typedef. This still should be better than the
plain uint32_t because the intended type is now obvious.
llvm-svn: 315525
The condition whether a section is alive or not by default
is becoming increasingly complex, so the decision of garbage
collection is spreading over InputSection.h and MarkLive.cpp,
which is not a good state.
This moves the code to MarkLive.cpp, to keep the file the central
place to make decisions about garbage collection.
llvm-svn: 315384
When reporting a symbol conflict, LLD parses the debug info to report
source location information. Sections have not been decompressed at this
point, so if an object file contains zlib compressed debug info, LLD
ends up passing this compressed debug info to the DWARF parser, which
causes debug info parsing failures and can trigger assertions in the
parser (as the test case demonstrates).
Decompress debug sections when constructing the LLDDwarfObj to avoid
this issue. This doesn't handle GNU-style compressed debug info sections
(.zdebug_*), which at present are simply ignored by LLDDwarfObj; those
can be done in a follow-up.
Differential Revision: https://reviews.llvm.org/D38491
llvm-svn: 314866
The result of hash_value(StringRef) depends on sizeof(size_t).
That causes lld to create different mergeable table contents on
32-bit machines.
This patch is to use xxHash64 so that we get the same hash values
on 32-bit machines.
llvm-svn: 314603
This is "Bug 34688 - lld much slower than bfd when linking the linux kernel"
Inside copyRelocations() we have O(N*M) algorithm, where N - amount of
relocations and M - amount of symbols in symbol table. It isincredibly slow
for linking linux kernel.
Patch creates local search tables to speedup.
With this fix link time goes for me from 12.95s to 0.55s what is almost 23x
faster. (used release LLD).
Differential revision: https://reviews.llvm.org/D38129
llvm-svn: 314282
EhSectionPiece used to have a pointer to a section, but that pointer was
mostly redundant because we almost always know what the section is without
using that pointer. This patch removes the pointer from the struct.
This patch also use uint32_t/int32_t instead of size_t to represent
offsets that are hardly be larger than 4 GiB. At the moment, I think it is
OK even if we cannot handle .eh_frame sections larger than 4 GiB.
Differential Revision: https://reviews.llvm.org/D38012
llvm-svn: 313697
We crashed when --emit-relocs was used
and relocated section was collected by GC.
Differential revision: https://reviews.llvm.org/D37561
llvm-svn: 313620
This patch removes lot of static Instances arrays from different input file
classes and introduces global arrays for access instead. Similar to arrays we
have for InputSections/OutputSectionCommands.
It allows to iterate over input files in a non-templated code.
Differential revision: https://reviews.llvm.org/D35987
llvm-svn: 313619
The patch implements initial support of microMIPS code linking:
- Handle microMIPS specific relocations.
- Emit both R1-R5 and R6 microMIPS PLT records.
For now linking mixed set of regular and microMIPS object files is not
supported. Also the patch does not handle (setup and clear) the
least-significant bit of an address which is utilized as the ISA mode
bit and allows to make jump between regular and microMIPS code without
any thunks.
Differential revision: https://reviews.llvm.org/D37335
llvm-svn: 313028
It is a bit more convinent and helps to simplify logic
of program headers allocation a little.
Differential revision: https://reviews.llvm.org/D34956
llvm-svn: 312711
Previously it was called twice for .comment synthetic section.
That created 2 pieces of data, which was deduplicated anyways,
but was not clean.
llvm-svn: 312327
We had a lock to guard BAlloc from being used concurrently, but that
is not very easy to understand. This patch replaces it with a
std::unique_ptr.
llvm-svn: 311056
This is PR33889,
Patch adds support of combination of linkerscript and
-symbol-ordering-file option.
If no sorting commands are present in script inside section declaration
and no --sort-section option specified, code uses sorting from ordering
file if any exist.
Differential revision: https://reviews.llvm.org/D35843
llvm-svn: 310045
We were not looking at Repl and so thinking there was no output
section associated with the merged symbol. Because of that it was
produced as absolute.
This was found by an internal round of testing.
llvm-svn: 308681
The get{ARM,AArch64}UndefinedRelativeWeakVA() functions should only be
called for PC-relative relocations. Complete the supported pc-relative
relocations in the switch statement and make the default case unreachable.
The R_ARM_TARGET relocation can be evaluated as R_ARM_REL32 but it is only
used in the context of exception tables, and is never output with respect
to a weak reference so it does not appear in the switch statement.
Differential Revision: https://reviews.llvm.org/D34138
llvm-svn: 305673
Given
.weak target
.global _start
_start:
b target
The intention is that the branch goes to the instruction after the
branch, effectively turning it on a nop. The branch adds the runtime
PC, but we were adding it statically too.
I noticed the oddity by inspection, but llvm-objdump seems to agree,
since it now prints things like:
b #-4 <_start+0x4>
llvm-svn: 305212
SHF_GROUP bit doesn't make sense in executables or DSOs, so linkers are
expected to remove that bit from section flags. We did that when we create
output sections.
This patch is to do that earlier than before. Now the flag is dropped when
we instantiate input section objects.
This change improves ICF. Previously, two sections that differ only in
SHF_GROUP flag were not merged, because when the control reached ICF,
the flag was still there. Now the flag is dropped before reaching to ICF,
so the difference is ignored naturally.
This issue was found by pcc.
Differential Revision: https://reviews.llvm.org/D34074
llvm-svn: 305134
Before InputSectionBase had an OutputSection pointer, but that was not
always valid. For example, if it was a merge section one actually had
to look at MergeSec->OutSec.
This was brittle and caused bugs like the one fixed by r304260.
We now have a single Parent pointer that points to an OutputSection
for InputSection, but to a SyntheticSection for merge sections and
.eh_frame. This makes it impossible to accidentally access an invalid
OutSec.
llvm-svn: 304338
We would crash if a SHF_LINK_ORDER section pointed to a non
InputSection section. Since those sections are not merged in order,
SHF_LINK_ORDER is pretty meaningless and we can error on that case.
llvm-svn: 304327
This is PR33052, "Bug 33052 - -r eats comdats ".
To fix it I stop removing group section from out when -r is given
and fixing SHT_GROUP content when writing it just like we do some
other fixup, e.g. for Rel[a]. (it needs fix for section indices that
are in group).
Differential revision: https://reviews.llvm.org/D33485
llvm-svn: 304140
In this way, the content and the flag is always consistent, which I
think better than removing the bit when input sections reaches the Writer.
llvm-svn: 303926
Summary:
This is required on some platforms, as GNU libstdc++ std::call_once is known to be buggy.
This fixes operation of LLD on at least NetBSD and perhaps OpenBSD and Linux PowerPC.
The same change has been introduced to LLVM and LLDB.
Reviewers: ruiu
Reviewed By: ruiu
Subscribers: emaste, #lld
Tags: #lld
Differential Revision: https://reviews.llvm.org/D33508
llvm-svn: 303788
GetSection is a template because write calls relocate.
relocate has two parts. The non alloc code really has to be a
template, as it is looking a raw input file data.
The alloc part is only a template because of getSize.
This patch folds the value of getSize early, detemplates
getRelocTargetVA and splits relocate into a templated non alloc case
and a regular function for the alloc case. This has the nice advantage
of making sure we collect all the information we need for relocations
before getting to InputSection::relocateNonAlloc.
Since we know got is alloc, it can just call the function directly and
avoid the template.
llvm-svn: 303355
This change adds support for the R_ARM_SBREL32 relocation. The relocation
is a base relative relocation that is produced by clang/llvm when -frwpi
is used. The use case for the -frwpi option is position independent data
for embedded systems that do not have a GOT. With -frwpi all data is
accessed via an offset from a base register (usually r9), where r9 is set
at run time to where the data has been loaded. The base of the data is
known as the static base.
The ARM ABI defines the static base as:
B(S) is the addressing origin of the output segment defining the symbol S.
The origin is not required to be the base address of the segment. For
simplicity we choose to use the base address of the segment.
The ARM procedure call standard only defines a read write variant using
R_ARM_SBREL32 relocations. The read-only data is accessed via pc-relative
offsets from the code, this is implemented in clang as -fropi.
Fixes PR32924
Differential Revision: https://reviews.llvm.org/D33280
llvm-svn: 303337
We generally want to use uint64_t instead of uintX_t if the 64-bit
type works for both 32-bit and 64-bit because it is simpler than
the variable-size type.
llvm-svn: 300293
Previously we silently produced broken output for R_386_GOT32X/R_386_GOT32
relocations if they were used to compute the address of the symbol’s global
offset table entry without base register when position-independent code is disabled.
Situation happened because of recent ABI changes. Released ABI mentions that
R_386_GOT32X can be calculated in a two different ways (so we did not follow ABI here
before this patch), but draft ABI also mentions R_386_GOT32 relocation here.
We should use the same calculations for both relocations.
Problem is that we always calculated them as G + A - GOT (offset from end of GOT),
but for case when PIC is disabled, according to i386 ABI calculation should be G + A,
what should produce just an address in GOT finally.
ABI: https://github.com/hjl-tools/x86-psABI/wiki/intel386-psABI-draft.pdf (p36, p60).
llvm-svn: 299812
Previously, undefined symbol errors are one line like this
and wasn't easy to read.
/ssd/clang/bin/ld.lld: error: /ssd/llvm-project/lld/ELF/Writer.cpp:207: undefined symbol 'lld:🧝:EhFrameSection<llvm::object::ELFType<(llvm::support::endianness)0, true> >::addSection(lld:🧝:InputSectionBase*)'
This patch make it more structured like this.
bin/ld.lld: error: undefined symbol: lld:🧝:EhFrameSection<llvm::object::ELFType<(llvm::support::endianness)0, true>
>>> Referenced by Writer.cpp:207 (/ssd/llvm-project/lld/ELF/Writer.cpp:207)
>>> Writer.cpp.o in archive lib/liblldELF.a
Discussion thread:
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111459.html
Differential Revision: https://reviews.llvm.org/D31481
llvm-svn: 299097
This is a shorthand for Config->Wordsize == 8. So this is not strictly
necessary but seems handy. "Is 64 bit?" is easier to read than "Is
wordsize 8 byte?"
llvm-svn: 298463
The patch introduces two new relocations expressions R_MIPS_GOT_GP and
R_MIPS_GOT_GP_PC. The first one represents a current value of `_gp`
pointer and used to calculate relocations against the `__gnu_local_gp`
symbol. The second one represents the offset between the beginning of
the function and the `_gp` pointer's value.
There are two motivations for introducing new expressions:
- It's better to keep all non-trivial relocation calculations in the
single place - `getRelocTargetVA` function.
- Relocations against both `_gp_disp` and `__gnu_local_gp` symbols
depend on the `_gp` value. It's a magical value points to the "middle"
of GOT. Now all relocations use a common `_gp` value. But in fact,
under some conditions each input file might require its own `_gp`
value. I'm going to implement it in the future patches. So it's
better to make `MipsGotSection` responsible for calculation of
the `_gp` value.
llvm-svn: 298306
We had a few Config member functions that returns configuration values.
For example, we had is64() which returns true if the target is 64-bit.
The return values of these functions are constant and never change.
This patch is to compute them only once to make it clear that they'll
never change.
llvm-svn: 298168
Synthetic sections don't belong to any input file, but still they
are input sections. Whenever problem occurs with relocations in
these sections lld crashes in error reporting, trying to print
input file name.
Differential revision: https://reviews.llvm.org/D30889
llvm-svn: 297711
With this we have a single section hierarchy. It is a bit less code,
but the main advantage will be in a future patch being able to handle
foo = symbol_in_obj;
in a linker script. Currently that fails since we try to find the
output section of symbol_in_obj. With this we should be able to just
return an InputSection from the expression.
llvm-svn: 297313
The list of all input sections was defined in SymbolTable class for a
historical reason. The list itself is not a template. However, because
SymbolTable class is a template, we needed to pass around ELFT to access
the list. This patch moves the list out of the class so that it doesn't
need ELFT.
llvm-svn: 296309
With this we complete the transition out of special output sections,
and with the previous patches it should be possible to merge
OutputSectionBase and OuputSection.
llvm-svn: 296023
With the current design an InputSection is basically anything that
goes directly in a OutputSection. That includes plain input section
but also synthetic sections, so this should probably not be a
template.
llvm-svn: 295993
We shouldn't report an error for R_*_NONE relocs since we're emitting
them when writing relocations to discarded sections.
Differential Revision: https://reviews.llvm.org/D30279
llvm-svn: 295936
In the target dependent code we already always return a int64_t. In
the target independent code we carefully use uintX_t, which has the
same result given 2 complement rules.
This just simplifies the code to use int64_t everywhere.
llvm-svn: 295263
This is a really horrible case. If a .eh_frame points to a discarded
section, it is not clear what is the correct thing to do.
It looks like ld.bfd discards the entire .eh_frame content and gold
discards the second relocation, leaving one frame with an fde that
refers to a bogus location. This is similar to what gold does.
llvm-svn: 295133
This reverts commit r295102.
In the link of seabios the assumption seems to be that the section has
an actual address, so this is not sufficient. Changing the assembly
code to add a "a" flag seems like the correct thing to do instead of
extending this hack.
Sorry about the noise.
Original message:
Relax the restriction on what relocations can be in a non-alloc section.
The main thing that they can't have is relocations that require the
creation of gots or plt. For now also accept R_PC.
Found while linking seabios.
llvm-svn: 295130
The main thing that they can't have is relocations that require the
creation of gots or plt. For now also accept R_PC.
Found while linking seabios.
llvm-svn: 295102
Unfortunately some consumers of our .o files produced with -r expect
only one section symbol per section. That is true of at least of go's
own linker.
Combining them is a somewhat convoluted process. We have to create a
symbol for every section since we don't know which ones will be
needed. The relocation sections also have to be written first to
handle the Elf_Rel addend.
I did consider a completely different approach:
We could remove the -r special case of relocation sections when
reading. We would instead have a copyRelocs function that is used
instead of scanRelocs. It would create a DynamicReloc for each
relocation and a RelocationSection for each input relocation section.
A complication of such change is that DynamicReloc would have to take
a section index and a input section instead of a symbol since with
-emit-relocs some DynamicReloc would hold relocations referring to the
dynamic symbol table and other to the static symbol table.
That would be a pretty big change, and if we do it it is probably
better to do it as a refactoring.
llvm-svn: 294816
with temporarily file name fix in testcase.
Original commit message:
-q, --emit-relocs - Generate relocations in output
Simplest implementation:
* no GC case,
* no "/DISCARD/" linkerscript command support.
This patch is extracted from D28612 / D29636,
Relative to PR31579.
Differential revision: https://reviews.llvm.org/D29663
llvm-svn: 294469
-q, --emit-relocs - Generate relocations in output
Simplest implementation:
* no GC case,
* no "/DISCARD/" linkerscript command support.
This patch is extracted from D28612 / D29636,
Relative to PR31579.
Differential revision: https://reviews.llvm.org/D29663
llvm-svn: 294464
With a synthetic merge section we can have, for example, a single
.rodata section with stings, fixed sized constants and non merge
constants.
I can be simplified further by not setting Entsize, but that is
probably better done is a followup patch.
This should allow some cleanup in the linker script code now that
every output section command maps to just one output section.
llvm-svn: 294005
Thunks are now implemented by redirecting the relocation to the
symbol S, to a symbol TS in a Thunk. The Thunk will transfer control
to S. This has the following implications:
- All the side-effects of Thunks happen within createThunks()
- Thunks are no longer stored in InputSections and Symbols no longer
need to hold a pointer to a Thunk
- The synthetic Thunk sections need to be merged into OutputSections
This implementation is almost a direct conversion of the existing
Thunks with the following exceptions:
- Mips LA25 Thunks are placed before the InputSection that defines
the symbol that needs a Thunk.
- All ARM Thunks are placed at the end of the OutputSection of the
first caller to the Thunk.
Range extension Thunks are not supported yet so it is optimistically
assumed that all Thunks can be reused.
This is a recommit of r293283 with a fixed comparison predicate as
std::merge requires a strict weak ordering.
Differential revision: https://reviews.llvm.org/D29327
llvm-svn: 293757
Thunks are now implemented by redirecting the relocation to the
symbol S, to a symbol TS in a Thunk. The Thunk will transfer control
to S. This has the following implications:
- All the side-effects of Thunks happen within createThunks()
- Thunks are no longer stored in InputSections and Symbols no longer
need to hold a pointer to a Thunk
- The synthetic Thunk sections need to be merged into OutputSections
This implementation is almost a direct conversion of the existing
Thunks with the following exceptions:
- Mips LA25 Thunks are placed before the InputSection that defines
the symbol that needs a Thunk.
- All ARM Thunks are placed at the end of the OutputSection of the
first caller to the Thunk.
Range extension Thunks are not supported yet so it is optimistically
assumed that all Thunks can be reused.
Differential Revision: https://reviews.llvm.org/D29129
llvm-svn: 293283
Previously we stored kept locals in a KeptLocalSyms arrays,
belonged to files.
Patch makes SymbolTableSection to store locals in Symbols member,
that already present and was used for globals.
SymbolTableSection already had NumLocals counter member, so change
itself is trivial.
That allows to simplify handling of -r,
Body::DynsymIndex is no more used as "symbol table index" for relocatable
output.
Change was suggested during review of D28773 and opens road for D28612.
Differential revision: https://reviews.llvm.org/D29021
llvm-svn: 292789
Previously we just crashed when had user defined
section .shstrtab, for example. Which name equals to synthetic one,
but have different type.
Testcase reveals an issue.
Differential revision: https://reviews.llvm.org/D28559
llvm-svn: 291765
Intention of change is to get rid of code duplication.
Decompressor was introduced in D28105.
Change allows to get rid of few methods relative to decompression.
Differential revision: https://reviews.llvm.org/D28106
llvm-svn: 291758
I thought for a while about how to remove it, but it looks like we
can just copy the file for now. Of course I'm not happy about that,
but it's just less than 50 lines of code, and we already have
duplicate code in Error.h and some other places. I want to solve
them all at once later.
Differential Revision: https://reviews.llvm.org/D27819
llvm-svn: 290062
This change seems to make LLD 0.6% faster when linking Clang with
debug info. I don't want us to have lots of local optimizations,
but this function is very hot, and the improvement is small but
not negligible, so I think it's worth doing.
llvm-svn: 288757
Some elf producers (dtrace) put this flag in relocation sections and
some (MC) don't. If we don't ignore the flag we end up with multiple
relocation sections poiting to the same section, which we don't
support.
llvm-svn: 288585
When -O0 is specified, we do not do section merging.
Though before this patch several sections were generated instead
of single, what is useless.
Differential revision: https://reviews.llvm.org/D27041
llvm-svn: 288151
The MipsGotSection::getPageEntryOffset calculates index of GOT entry
with a "page" address. Previously this method changes the state
of MipsGotSection because it modifies PageIndexMap field. That leads
to the unpredictable results if getPageEntryOffset called from multiple threads.
The patch makes getPageEntryOffset constant. To do so it calculates GOT
entry index but does not update PageIndexMap field. Later in the
MipsGotSection::writeTo method linker calculates "page" addresses and
writes them to the output.
llvm-svn: 288129
They return new vectors, but at the same time they mutate other vectors,
so returning values doesn't make much sense. We should just mutate two
vectors.
llvm-svn: 287979
Uncompressing section contents and spliting mergeable section contents
into smaller chunks are heavy tasks. They scan entire section contents
and do CPU-intensive tasks such as uncompressing zlib-compressed data
or computing a hash value for each section piece.
Luckily, these tasks are independent to each other, so we can do that
in parallel_for_each. The number of input sections is large (as opposed
to the number of output sections), so there's a large parallelism here.
Actually the current design to call uncompress() and splitIntoPieces()
in batch was chosen with doing this in mind. Basically what we need to
do here is to replace `for` with `parallel_for_each`.
It seems this patch improves latency significantly if linked programs
contain debug info (which in turn contain lots of mergeable strings.)
For example, the latency to link Clang (debug build) improved by 20% on
my machine as shown below. Note that ld.gold took 19.2 seconds to do
the same thing.
Before:
30801.782712 task-clock (msec) # 3.652 CPUs utilized ( +- 2.59% )
104,084 context-switches # 0.003 M/sec ( +- 1.02% )
5,063 cpu-migrations # 0.164 K/sec ( +- 13.66% )
2,528,130 page-faults # 0.082 M/sec ( +- 0.47% )
85,317,809,130 cycles # 2.770 GHz ( +- 2.62% )
67,352,463,373 stalled-cycles-frontend # 78.94% frontend cycles idle ( +- 3.06% )
<not supported> stalled-cycles-backend
44,295,945,493 instructions # 0.52 insns per cycle
# 1.52 stalled cycles per insn ( +- 0.44% )
8,572,384,877 branches # 278.308 M/sec ( +- 0.66% )
141,806,726 branch-misses # 1.65% of all branches ( +- 0.13% )
8.433424003 seconds time elapsed ( +- 1.20% )
After:
35523.764575 task-clock (msec) # 5.265 CPUs utilized ( +- 2.67% )
159,107 context-switches # 0.004 M/sec ( +- 0.48% )
8,123 cpu-migrations # 0.229 K/sec ( +- 23.34% )
2,372,483 page-faults # 0.067 M/sec ( +- 0.36% )
98,395,342,152 cycles # 2.770 GHz ( +- 2.62% )
79,294,670,125 stalled-cycles-frontend # 80.59% frontend cycles idle ( +- 3.03% )
<not supported> stalled-cycles-backend
46,274,151,813 instructions # 0.47 insns per cycle
# 1.71 stalled cycles per insn ( +- 0.47% )
8,987,621,670 branches # 253.003 M/sec ( +- 0.60% )
148,900,624 branch-misses # 1.66% of all branches ( +- 0.27% )
6.747548004 seconds time elapsed ( +- 0.40% )
llvm-svn: 287946
The function was used only within Relocations.cpp, but now we are
using it in many places, so this patch moves it to a file that fits
to the functionality.
llvm-svn: 287943
Offset between beginning of a .got section and _gp symbols used in MIPS
GOT relocations calculations. Usually the expression looks like
VA + Offset - GP, where VA is the .got section address, Offset - offset
of the GOT entry, GP - offset between .got and _gp. Also there two "magic"
symbols _gp_disp and __gnu_local_gp which hold the offset mentioned above.
These symbols might be referenced by MIPS relocations.
Now the linker always defines _gp symbol and uses hardcoded value for
its initialization. So offset between .got and _gp is 0x7ff0. The _gp_disp
and __gnu_local_gp defined if required and initialized by 0x7ff0.
In fact that is not correct because _gp symbol might be defined by a linker
script and holds arbitrary value. In that case we need to use this value
in relocation calculation and initialize _gp_disp and __gnu_local_gp
properly.
The patch fixes the problem and completes fixing the bug #30311.
https://llvm.org/bugs/show_bug.cgi?id=30311
Differential revision: https://reviews.llvm.org/D27036
llvm-svn: 287832
We have different functions to stringize objects to construct
error messages. For InputFile, we have getFilename, and for
InputSection, we have getName. You had to memorize them.
I think this is the case where the function overloading comes in handy.
This patch defines toString() functions that are overloaded for all these
types, so that you just call it in error().
Differential Revision: https://reviews.llvm.org/D27030
llvm-svn: 287787
MergeOutputSection class was a bit hard to use because it provdes
a series of finalize functions that have to be called in a right way
at a right time. It also intereacted with MergeInputSection, and the
logic was somewhat entangled between the two classes.
This patch simplifies it by providing only one finalize function.
Now, all you have to do is to call MergeOutputSection::finalize
when you have added all sections to the output section. Then, it
internally merges strings and initliazes StringPiece objects.
I think this is much easier to understand.
This patch also adds comments.
llvm-svn: 287314
MIPS GOT handling is very different from other targets so it is better
to keep the code in the separatre section class MipsGotSection. This
patch introduces the new section and moves all MIPS specific code from
GotSection to the new class. I did not rename fields and methods in the
MipsGotSection class to reduce the diff and plan to do that by the
separate commit.
Differential revision: https://reviews.llvm.org/D26733
llvm-svn: 287150
Relocations are the last thing that we wore storing a raw section
pointer to and parsing on demand.
With this patch we parse it only once and store a pointer to the
actual data.
The patch also changes where we store it. It is now in
InputSectionBase. Not all sections have relocations, but most do and
this simplifies the logic. It also means that we now only support one
relocation section per section. Given that that constraint is
maintained even with -r with gold bfd and lld, I think it is OK.
llvm-svn: 286459
Previously, we have both input and output section for .MIPS.abiflags.
Now we have only one class for .MIPS.abiflags, which is MipsAbiFlagsSection.
This class is a synthetic input section.
.MIPS.abiflags sections are handled as regular sections until
the control reaches Writer. Writer then aggregates all sections
whose type is SHT_MIPS_ABIFLAGS to create a single synthesized
input section. The synthesized section is then processed normally
as if it came from an input file.
llvm-svn: 286398
Previously, we have both input and output sections for .reginfo and
.MIPS.options. Now for each such sections we have one synthetic input
sections: MipsReginfoSection and MipsOptionsSection respectively.
Both sections are handled as regular sections until the control reaches
Writer. Writer then aggregates all sections whose type is SHT_MIPS_REGINFO
or SHT_MIPS_OPTIONS to create a single synthesized input section. In that
moment Writer also save GP0 value to the MipsGp0 field of the corresponding
ObjectFile. This value required for R_MIPS_GPREL16 and R_MIPS_GPREL32
relocations calculation.
Differential revision: https://reviews.llvm.org/D26444
llvm-svn: 286397
The ARM 32 and 64-bit ABI does not use 0 for undefined weak references
that are used in PC relative relocations. In particular:
- A branch relocation to an undefined weak resolves to the next
instruction. Effectively making the branch a no-op
- In all other cases the symbol resolves to the place so that S + A - P
resolves to A.
Differential Revision: https://reviews.llvm.org/D26240
llvm-svn: 286353
This is similar to what was done for InputSection.
With this the various fields are stored in host order and only
converted to target order when writing.
llvm-svn: 286327
A CommonInputSection is a section containing all common symbols.
That was an input section but was abstracted in a different way
than the synthetic input sections because it was written before
the synthetic input section was invented.
This patch rewrites CommonInputSection as a synthetic input section
so that it behaves better with other sections.
llvm-svn: 286053
We are going to have many more classes for linker-synthesized
input sections, so it's worth to be added to a separate file
than to the file for regular input sections.
llvm-svn: 285740
The example reported in PR30793 shows a case where gc reclaims
a SHF_TLS section, but it doesn't reclaim the section containing
the debug info for it.
This is expected, as we do not reclaim non-alloc sections
during the garbage collection phase (and this is not going to
change anytime soon, at least this is what I gathered last I
talked with Rafael about it).
So, we end up with a pending reference, thinking that the input
was invalid (which is not true, as it's GC that removed the
SHT_TLS section, and therefore didn't create the PT_TLS *segment*
for it). In cases like this, just assign a VA of zero at relocation
time instead of error'ing out (this is what gold does as well, FWIW).
Differential Revision: https://reviews.llvm.org/D26201
llvm-svn: 285735
Instead of storing a pointer, store the members we need.
The reason for doing this is that it makes it far easier to create
synthetic sections. It also avoids reading data from files multiple
times., which might help with cross endian linking and host
architectures with slow unaligned access.
There are obvious compacting opportunities, but this already has mixed
results even on native x86_64 linking.
There is also the possibility of better refactoring the code for
handling common symbols, but this already shows that a custom class is
not necessary.
llvm-svn: 285148
We were fairly inconsistent as to what information should be accessed
with getSectionHdr and what information (like alignment) was stored
elsewhere.
Now all section info has a dedicated getter. The code is also a bit
more compact.
llvm-svn: 285079
Some MIPS relocations used to access GOT entries are able to manipulate
16-bit index. The other ones like R_MIPS_CALL_HI16/LO16 can handle
32-bit indexes. 16-bit relocations are generated by default. The 32-bit
relocations are generated by -mxgot flag passed to compiler. Usually
these relocation are not mixed in the same code but files like crt*.o
contain 16-bit relocations so even if all "user's" code compiled with
-mxgot flag a few 16-bit relocations might come to the linking phase.
Now LLD does not differentiate local GOT entries accessed via a 16-bit
and 32-bit indexes. That might lead to relocation's overflow if 16-bit
entries are allocated to far from the beginning of the GOT.
The patch introduces new "part" of MIPS GOT dedicated to the local GOT
entries accessed by 32-bit relocations. That allows to put local GOT
entries accessed via a 16-bit index first and escape relocation's overflow.
Differential revision: https://reviews.llvm.org/D25833
llvm-svn: 284809
The R_ARM_PREL31 and R_ARM_NONE relocations should not be faulted in
shared libraries. In the case of R_ARM_NONE, we have moved the TLS
relaxation hint instruction to R_TLSDESC_CALL so that R_HINT can be used
without side-effects. In the case of R_ARM_PREL31 we permit it to be used
against PLT entries as the personality routines are imported when used in
shared libraries.
Differential Revision: https://reviews.llvm.org/D25721
llvm-svn: 284710
Even with the hash table cache, binary search was still pretty
hot. This can be made even faster with prefetching.
Idea from http://cglab.ca/~morin/misc/arraylayout-v2/
I will suggest moving this to llvm.
llvm-svn: 284594
Summary:
Reclaiming the name 'CachedHashString' will let us add a type with that
name that owns its value.
Reviewers: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25644
llvm-svn: 284434
Previously, we supported only SHF_COMPRESSED sections because it's
new and it's the ELF standard. But there are object files compressed
in the GNU style out there, so we had to support it.
Sections compressed in the GNU style start with ".zdebug_" and
contain different headers than the ELF standard's one. In this
patch, getRawCompressedData is responsible to handle it.
A tricky thing about GNU-style compressed sections is that we have
to rename them when creating output sections. ".zdebug_" prefix
implies the section is compressed. We need to rename ".zdebug_"
".debug" because our output sections are not compressed.
We do that in this patch.
llvm-svn: 284068
The .ARM.exidx sections contain a table. Each entry has two fields:
- PREL31 offset to the function the table entry describes
- Action to take, either cantunwind, inline unwind, or PREL31 offset to
.ARM.extab section
The table entries must be sorted in order of the virtual addresses the
first entry of the table describes. Traditionally this is implemented by
the SHF_LINK_ORDER dependency. Instead of implementing this directly we
sort the table entries post relocation.
The .ARM.exidx OutputSection is described by the PT_ARM_EXIDX program
header
Differential revision: https://reviews.llvm.org/D25127
llvm-svn: 283730
I found that this check still may be useful in some cases.
At fact since we use uint32_t alignment, then maximum value
that is valid for us is 0x80000000. But some broken files,
for example file from testcase may have greater value.
Because of that offset calculation overflow and crash happens.
Differential revision: https://reviews.llvm.org/D25324
llvm-svn: 283544
This spreads out computing the hash and using it in a hash table. The
speedups are:
firefox
master 6.811232891
patch 6.559280249 1.03841162939x faster
chromium
master 4.369323666
patch 4.33171853 1.00868134338x faster
chromium fast
master 1.856679971
patch 1.850617741 1.00327578725x faster
the gold plugin
master 0.32917962
patch 0.325711944 1.01064645023x faster
clang
master 0.558015452
patch 0.550284165 1.01404962652x faster
llvm-as
master 0.032563515
patch 0.032152077 1.01279662275x faster
the gold plugin fsds
master 0.356221362
patch 0.352772162 1.00977741549x faster
clang fsds
master 0.635096494
patch 0.627249229 1.01251060127x faster
llvm-as fsds
master 0.030183188
patch 0.029889544 1.00982430511x faster
scylla
master 3.071448906
patch 2.938484138 1.04524944215x faster
This seems to be because we don't stall as much. When linking firefox
stalled-cycles-frontend goes from 57.56% to 55.55%.
With -O2 the difference is even more significant since we avoid
recomputing the hash. For firefox we go from 9.990295265 to
9.149627521 seconds (1.09x faster).
llvm-svn: 283367