This case should be possible to handle, but it is hard:
* In order to create program headers correctly, we have to scan the
sections in the order they are in the file.
* To find that order, we have to "execute" the linker script.
* The linker script can contain SIZEOF_HEADERS.
So to support this we have to start with a guess of how many headers
we need (3), run the linker script and try to create the program
headers. If it turns out we need more headers, we run the script again
with a larger SIZEOF_HEADERS.
Also, running the linker script depends on knowing the size of the
sections, so we have to finalize them. But creating the program
headers can change the value stored in some sections, so we have to
split size finalization and content finalization.
Looks like the last part is also needed for range extension thunks, so
we might support this at some point. For now just report an error
instead of producing broken files.
llvm-svn: 295458
SHF_LINK_ORDER sections adds special ordering requirements.
Such sections references other sections. Previously we would crash
if section that other were referenced to was discarded by script.
Patch fixes that by discarding all dependent sections in that case.
It supports chained dependencies, testcase is provided.
Differential revision: https://reviews.llvm.org/D30033
llvm-svn: 295332
Unfortunately, the common way of writing linker scripts seems to be
to get the output of ld.bfd --verbose and edit it a bit.
Also unfortunately, the bfd default script contains things like
.rela.dyn : { *(... .rela.data ...) }
but bfd actually ignores that for -emit-relocs, so we have to do the
same.
llvm-svn: 295324
The linker script lexer is context-sensitive. In the regular context,
arithmetic operator characters are regular characters, but in the
expression context, they are independent tokens. This afects how the
lexer tokenizes "3*4", for example. (This kind of expression is real;
the Linux kernel uses it.)
This patch defines function `maybeSplitExpr`. This function splits the
current token into multiple expression tokens if the lexer is in the
expression context.
Differential Revision: https://reviews.llvm.org/D29963
llvm-svn: 295225
OUTPUT_ARCH command can contain architecture values separated with ":", like:
OUTPUT_ARCH(i386:x86-64)
We did not support that, because got 3 lexer tokens here after recent changes.
This trivial patch fixes the issue, now whole expression inside
OUTPUT_ARCH is just ignored.
Differential revision: https://reviews.llvm.org/D29640
llvm-svn: 294432
LLD already parses ALIGN expression to specifiy alignment for output
sections in linker scripts but it never applies the alignment to the
output section. This change handles that.
Differential Revision: https://reviews.llvm.org/D29689
llvm-svn: 294374
DefinedSynthetic symbols are attached to sections,
for the case when such symbol was attached to non-allocated section,
we calculated its value incorrectly.
We subtracted Body->Section->Addr, but non-allocatable sections
should have zero VA in output and therefore result value was wrong.
And at the same time we have Body->Section->Addr != 0 for them
internally because use it for calculation of section size.
Patch fixes calculation of such symbols values.
Differential revision: https://reviews.llvm.org/D29653
llvm-svn: 294322
We had assignSymbol and assignSectionSymbol methods which has similar functionality.
Patch removes one of copy and reuses another in code.
Differential revision: https://reviews.llvm.org/D29582
llvm-svn: 294290
We now create a dummy section with index 1 before processing the
linker script.
Thanks to George Rimar for finding the bug and providing the initial
testcase.
llvm-svn: 294252
This is a fix for Bugzilla 31813.
The problem is that the tokenizer does not create a separate token for
":" unless there's white space before it. Changed it to always create
a token for ":" and reworked some logic that relied on ":" being
attached to some tokens like "global:" and "local:".
llvm-svn: 294006
This is alternative to D28857 which was incorrect.
One of linux scripts contains:
vvar_start = . - 2 * (1 << 12);
vvar_page = vvar_start;
vvar_vsyscall_gtod_data = vvar_page + 128;
Previously we did not mark first expression as non-absolute,
though it contains location counter.
And LLD failed with error:
relocation R_X86_64_PC32 cannot refer to absolute symbol
This patch should fix the issue, and opens road for doing the same for other operators
(though not clear if that is needed).
Differential revision: https://reviews.llvm.org/D29332
llvm-svn: 293748
Linux kernel linkerscript contains additional semicolon (last line):
.apicdrivers : AT(ADDR(.apicdrivers) - LOAD_OFFSET) {
__apicdrivers = .;
*(.apicdrivers);
I checked that both gold and bfd are able to parse something like:
.text : { ;;*(.text);;S = 0;; } }
Patch do the same.
Differential revision: https://reviews.llvm.org/D29276
llvm-svn: 293612
[ELF] Fixed formatting. NFC
and
[ELF] Bypass section type check
Differential revision: https://reviews.llvm.org/D28761
They do the opposite of what was asked for in the code review.
llvm-svn: 293320
As specified here:
* https://sourceware.org/binutils/docs/ld/MEMORY.html#MEMORY
There are two deviations from what is specified for GNU ld:
1. Only integer constants and *not* constant expressions
are allowed in `LENGTH` and `ORIGIN` initializations.
2. The `I` and `L` attributes are *not* implemented.
With (1) there is currently no easy way to evaluate integer
only constant expressions. This can be enhanced in the
future.
With (2) it isn't clear how these flags map to the `SHF_*`
flags or if they even make sense for an ELF linker.
Differential Revision: https://reviews.llvm.org/D28911
llvm-svn: 292875
Found that during attempts of linking linux kernel,
previously we partially duplicated code from getOutputSection(),
and it missed commons symbol case.
Differential revision: https://reviews.llvm.org/D28903
llvm-svn: 292594
This actually simplifies the code a bit as now all local symbols are
handled uniformly.
This should fix the build of www/webkit2-gtk3.
llvm-svn: 291569
This patch allows for linker scripts to assign a new value
to a symbol that is already defined (either in an object file
or the linker script itself).
llvm-svn: 291459
Previously, files added using INCLUDE directive weren't added
to reproduce archives. In this patch, I defined a function to
open a file and use that from Driver and LinkerScript.
llvm-svn: 291413
After Mark's patch I was wondering what was the rationale for the ELF
spec requiring us to merge only sections with matching flags and
types. I tried emailing
https://groups.google.com/forum/#!forum/generic-abi, but looks like my
emails are not being posted (the list is probably moderated). I
emailed Cary Coutant instead.
Cary pointed out that the section was a late addition and didn't got
the scrutiny it deserved. Given that and the problems found by
implementing the letter of the standard, I propose changing lld to
merge all sections with the same name and issue errors if the types or
some critical flags are different.
This should allow an unmodified firefox linked with lld to run.
This also merges some code with the linkerscript path.
llvm-svn: 291107
DefinedSynthetic is not created for a real ELF object, so it doesn't
have to be a template function. It has a virtual st_value, which is
either 32 bit or 64 bit, but we can simply use 64 bit.
llvm-svn: 290241
It was revealed by D27831.
If we have linkerscript that includes another one that sets OUTPUT for example:
RUN: echo "INCLUDE \"foo.script\"" > %t.script
RUN: echo "OUTPUT(\"%t.out\")" > %T/foo.script
then we do:
void ScriptParser::readInclude() {
...
std::unique_ptr<MemoryBuffer> &MB = *MBOrErr;
tokenize(MB->getMemBufferRef());
OwningMBs.push_back(std::move(MB));
}
void ScriptParser::readOutput() {
...
Config->OutputFile = unquote(Tok);
...
}
Problem is that OwningMBs are destroyed after script parser do its job.
So all Toks are dead and Config->OutputFile points to destroyed data.
Patch suggests to save all included scripts into using string Saver.
Differential revision: https://reviews.llvm.org/D27987
llvm-svn: 290238
This handles all the corner cases if setting a section address:
- If the address is too low, we cannot allocate the program headers.
- If the load address is lowered, we have to do that before finalize
This also shares some code with the linker script since it was already
hitting similar cases.
This is used by the freebsd boot loader. It is not clear if we need to
support this with a non binary output, but it is not as bad as I was
expecting.
llvm-svn: 290136
I thought for a while about how to remove it, but it looks like we
can just copy the file for now. Of course I'm not happy about that,
but it's just less than 50 lines of code, and we already have
duplicate code in Error.h and some other places. I want to solve
them all at once later.
Differential Revision: https://reviews.llvm.org/D27819
llvm-svn: 290062
PR31335 shows that we do that in next case:
SECTIONS { .text 0x2000 : {. = 0x100 ; *(.text) } }
though documentations says that "If . is used inside a section
description however, it refers to the byte offset from the start
of that section, not an absolute address. " looks does not work
as documented in bfd (as mentioned in comments for PR31335).
Until we find out the expected behavior was suggested at least not
to 'crash', what we do after trying to generate huge file.
Differential revision: https://reviews.llvm.org/D27712
llvm-svn: 289782
The feature is documented as
-----------------------------
The format of the dynamic list is the same as the version node
without scope and node name. See *note VERSION:: for more
information.
--------------------------------
And indeed qt uses a dynamic list with an 'extern "C++"' in it. With
this patch we support that
The change to gc-sections-shared makes us match bfd. Just because we
kept bar doesn't mean it has to be in the dynamic symbol table.
The changes to invalid-dynamic-list.test and reproduce.s are because
of the new parser.
The changes to version-script.s are the only case where we change
behavior with regards to bfd, but I would like to see a mix of
--version-script and --dynamic-list used in the wild before
complicating the code.
llvm-svn: 289082
This change continues what was started by D27040
Now all allocatable synthetics should be available from script side.
Differential revision: https://reviews.llvm.org/D27131
llvm-svn: 288150
Previously Config->SingleRoRx was set in
createFiles() and used HasSections.
This change moves it to readConfigs at place of
common flags handling, and adds logic that sets
this flag separatelly from ScriptParser if SECTIONS present.
llvm-svn: 288021
Unfortunatelly some scripts look like
kernphys = ...
. = ....
and the expectation in that every orphan section is after the
assignment.
llvm-svn: 287996
This is an horrible special case, but seems to match bfd's behaviour
and is important for avoiding placing an orphan section before the
expected start of the file.
llvm-svn: 287994
GNU LD allows `ASSERT` commands to be in output section descriptions.
Note that LD also mandates that `ASSERT` commands in this context must
end with a semicolon.
llvm-svn: 287677
If the linker script has SECTIONS, the address computation is now
always done in LinkerScript::assignAddresses, like for any other
section.
Before fixHeaders would do a tentative computation that
assignAddresses would sometimes override.
This patch also splits the cases where assignAddresses needs to add
the headers to the first PT_LOAD and the address computation. The net
effect is that we no longer create an empty page for no reason in the
included test case, which matches bfd behavior.
llvm-svn: 287565
Previously, we set (uintptr_t)-1 to InputSectionBase::OutSec to record
that a section has already been set to be assigned to some output section
by linker scripts. Later, we restored nullptr to the pointer to use
the field for the original purpose. That overloading is not very easy to
understand.
This patch adds a bit flag for that purpose, so that we don't need
to piggyback the flag on an unrelated pointer.
llvm-svn: 287508
readVersionDeclaration was to read anonymous version definition and
named version definition. Splitting it into two functions should
improve readability as the two cases are different enough.
I also changed a few helper functions to return values instead of
mutating given references.
llvm-svn: 287319
Linker script doesn't create a section if it has no content. So the following
script doesn't create .norelocs section if it doesn't have any .rel* sections.
.norelocs : { *(.rel*) }
Later, if you assert that the size of .norelocs is 0, LLD printed out
an error message, because it didn't allow calling SIZEOF() on nonexistent
sections.
This patch allows SIZEOF() on nonexistent sections, so that you can do
something like this.
ASSERT(SIZEOF(.norelocs), "shouldn't contain .rel sections!")
Note that this behavior is compatible with GNU.
Differential Revision: https://reviews.llvm.org/D26810
llvm-svn: 287257
This change separates all versioned locals to be a separate list in config,
that was suggested by Rafael and simplifies the logic a bit.
Differential revision: https://reviews.llvm.org/D26754
llvm-svn: 287132
Patch adds a filename to that error message.
I faced next error when debugged one of FreeBSD port:
error: relocation R_X86_64_PLT32 cannot refer to absolute symbol __tls_get_addr
error message was poor and this patch improves it to show the locations
of symbol declaration and using.
Differential revision: https://reviews.llvm.org/D26508
llvm-svn: 286940
Propagate program headers by walking the commands, not the
sections. This allows us to propagate program headers even from
sections that don't end up in the output.
Fixes pr30997.
llvm-svn: 286837
Previously we did not support anything except "local: *", patch changes that.
Actually GNU rules of proccessing wildcards are more complex than that (http://www.airs.com/blog/archives/300):
There are 2 iteration for wildcards, at first iteration "*" wildcards are ignored and handled at second iteration.
Since we previously decided not to implement such complex rules,
I suggest solution that is implemented in this patch. So for "local: *" case nothing changes,
but if we have wildcarded locals,
they are processed before wildcarded globals.
This should fix several FreeBSD ports, one of them is jpeg-turbo-1.5.1 and
currently blocks about 5k of ports.
Differential revision: https://reviews.llvm.org/D26395
llvm-svn: 286713
The disadvantage is that we use uint64_t instad of uint32_t for some
value in 32 bit files. The advantage is a substantially simpler code,
faster builds and less code duplication.
llvm-svn: 286414
This is similar to what was done for InputSection.
With this the various fields are stored in host order and only
converted to target order when writing.
llvm-svn: 286327
A CommonInputSection is a section containing all common symbols.
That was an input section but was abstracted in a different way
than the synthetic input sections because it was written before
the synthetic input section was invented.
This patch rewrites CommonInputSection as a synthetic input section
so that it behaves better with other sections.
llvm-svn: 286053
Previously, we do this piece of code to iterate over all input sections.
for (elf::ObjectFile<ELFT> *F : Symtab.getObjectFiles())
for (InputSectionBase<ELFT> *S : F->getSections())
It turned out that this mechanisms doesn't work well with synthetic
input sections because synthetic input sections don't belong to any
input file.
This patch defines a vector that contains all input sections including
synthetic ones.
llvm-svn: 286051
Previously, it didn't support the character class, so we couldn't
eliminate the use fo llvm::Regex. Now that it is supported, we
can remove compileGlobPattern, which converts a glob pattern to
a regex.
This patch contains optimization for exact/prefix/suffix matches.
Differential Revision: https://reviews.llvm.org/D26284
llvm-svn: 285949
This can speed up lld up to 5 times when linking applications
with large number of sections and using linker script.
Differential revision: https://reviews.llvm.org/D26241
llvm-svn: 285895
With this patch we keep track of the fact that . is a position in the
file and therefore not absolute. This allow us to compute relative
relocations that involve symbol that are defined in linker scripts
with '.'.
This fixes https://llvm.org/bugs/show_bug.cgi?id=30406
There is still more work to track absoluteness over the various
expressions, but this should unblock linking the EFI bootloader.
llvm-svn: 285641
We parse linker scripts very early, but whether an expression is
absolute or not can depend on a symbol defined in a .o. Given that, we
have to delay the computation of IsAbsolute. We can do that by storing
an AST when parsing or by also making IsAbsolute a function like we do
for the expression value. This patch implements the second option.
llvm-svn: 285628
And as a token of the new feature, make ALIGNOF always absolute.
This is a step in making it possible to have non absolute symbols out
of output sections.
llvm-svn: 285608
Previously, we have a lot of BumpPtrAllocators, but all these
allocators virtually have the same lifetime because they are
not freed until the linker finishes its job. This patch aggregates
them into a single allocator.
Differential revision: https://reviews.llvm.org/D26042
llvm-svn: 285452
Instead of storing a pointer, store the members we need.
The reason for doing this is that it makes it far easier to create
synthetic sections. It also avoids reading data from files multiple
times., which might help with cross endian linking and host
architectures with slow unaligned access.
There are obvious compacting opportunities, but this already has mixed
results even on native x86_64 linking.
There is also the possibility of better refactoring the code for
handling common symbols, but this already shows that a custom class is
not necessary.
llvm-svn: 285148
We were fairly inconsistent as to what information should be accessed
with getSectionHdr and what information (like alignment) was stored
elsewhere.
Now all section info has a dedicated getter. The code is also a bit
more compact.
llvm-svn: 285079
This script below shouldn't include file and program headers
to PT_LOAD segment, because it doesn't have PHDRS and FILEHDR
attributes:
PHDRS { all PT_LOAD; }
SECTIONS { /* list of sections here */ }
Differential revision: https://reviews.llvm.org/D25774
llvm-svn: 284709
Linker scripts may specify PHDRS, but not specify section to
segment assignments, i.e:
PHDRS { seg PT_LOAD; }
SECTIONS {
.sec1 {} : seg
.sec2 {}
}
In such case linker should still choose some segment for .sec2 section.
This patch will add .sec2 to previously opened segments (seg) or to the
very first PT_LOAD segment, if no section-to-segment assignments has been
made
Differential revision: https://reviews.llvm.org/D24795
llvm-svn: 284600
Both gold and ld accepts integers instead of named constants
for PHDRS.
Patch adds support for that.
Differential revision: https://reviews.llvm.org/D25549
llvm-svn: 284470
skip() and skip(StringRef) were overloaded functions that
have different semantics. This patch rename one of the functions
to avoid function overloading.
llvm-svn: 284396
Most functions that return StringRef should check their return values,
so I'm planning on marking StringRef [[nodiscard]]. This requires
splitting up functions like next() that are sometimes just used for
side effects.
llvm-svn: 284363
While the toStringRef API almost certainly ends up populating the
SmallString here, the correct way to use this API is to use the return
value.
llvm-svn: 284361
This is 30646.
PT_OPENBSD_RANDOMIZE
The array element specifies the location and size of a part of the memory image of the program that must be filled with random data before any code in the object is executed. The memory region specified by a segment of this type may overlap the region specified by a PT_GNU_RELRO segment, in which case the intersection will be filled with random data before being marked read-only.
Reference links:
http://man.openbsd.org/OpenBSD-current/man5/elf.5c494713c45
Differential revision: https://reviews.llvm.org/D25469
llvm-svn: 284234
-z wxneeded creates a PHDR PT_OPENBSD_WXNEEDED.
PT_OPENBSD_WXNEEDED
The array element specifies that a process executing this file may need to be able to map or protect memory regions as simultaneously executable and writable. If the system is unable or unwilling to permit that for this executable then it may fail immediately. This segment type is meaningful only for executable files and is ignored in other objects.
http://man.openbsd.org/OpenBSD-current/man5/elf.5
Differential revision: https://reviews.llvm.org/D25472
llvm-svn: 284226
Previously, we supported only SHF_COMPRESSED sections because it's
new and it's the ELF standard. But there are object files compressed
in the GNU style out there, so we had to support it.
Sections compressed in the GNU style start with ".zdebug_" and
contain different headers than the ELF standard's one. In this
patch, getRawCompressedData is responsible to handle it.
A tricky thing about GNU-style compressed sections is that we have
to rename them when creating output sections. ".zdebug_" prefix
implies the section is compressed. We need to rename ".zdebug_"
".debug" because our output sections are not compressed.
We do that in this patch.
llvm-svn: 284068
r283984 introduced a problem of too many warning messages being shown
when -ffunction-sections and -fdata-sections were used in conjunction
with --gc-sections linker flag and debugging information present. This
happens because lot of relocations from .debug_line section may become
invalid in such case. The newer fix doesn't show any warning message but
zeroes OutSec pointer in createInputSectionList() to avoid crash, when
relocations are written
llvm-svn: 284010
Sometimes the very first PT_LOAD segment, created by lld, can be empty.
This happens when (all conditions met):
- Linker script is used
- First section in ELF image is not RO
- Not enough space for program headers.
Differential revision: https://reviews.llvm.org/D25330
llvm-svn: 283760
We were implicitly creating space for the headers. That is not the
behaviour of bfd, which requires the script to use SIZEOF_HEADERS. The
difference is important for scripts that don't use SIZEOF_HEADERS and
expect the first section to be at 0.
llvm-svn: 282818
If there is not sufficient address space, just give up and don't put
the header in the PT_LOAD.
This matches bfd behaviour and I found at least one script that
depends on having a section at address 0.
llvm-svn: 282750
The BYTE, SHORT, LONG, and QUAD commands store one, two, four, and eight bytes (respectively).
After storing the bytes, the location counter is incremented by the number of bytes
stored.
Previously our scripts handles these commands incorrectly. For example:
SECTIONS {
.foo : {
*(.foo.1)
BYTE(0x11)
...
We accepted the script above treating BYTE as input section description.
These commands are used in the wild though.
Differential revision: https://reviews.llvm.org/D24830
llvm-svn: 282429
We were counting the size of the bss section holding common symbols twice:
Dot += CurOutSec->getSize();
flush();
The new code is also simpler as now flush is the only function that
inserts in AlreadyOutputOS, which makes sense since the set hold fully
output sections.
llvm-svn: 282285
Previously we failed to parse next scripts because disallowed
a space between filler value and '=':
.text : {
...
} :text = 0x9090
Differential revision: https://reviews.llvm.org/D24831
llvm-svn: 282248
DEFINED(symbol)
Return 1 if symbol is in the linker global symbol table and is defined before
the statement using DEFINED in the script, otherwise return 0.
Can be used to define default values for symbols. Found it in the wild.
Differential revision: https://reviews.llvm.org/D24858
llvm-svn: 282245
Found this operators used in the wild scripts, for example:
__got2_entries = (_FIXUP_TABLE_ - _GOT2_TABLE_) >>2;
__fixup_entries = (. - _FIXUP_TABLE_)>>2;
Differential revision: https://reviews.llvm.org/D24860
llvm-svn: 282243