Executable sections should not be padded with zero by default. On some
architectures, 0x00 is the start of a valid instruction sequence, so can confuse
disassembly between InputSections (and indeed the start of the next InputSection
in some situations). Further, in the case of misjumps into padding, padding may
start to be executed silently.
On x86, the "0xcc" byte represents the int3 trap instruction. It is a single
byte long so can serve well as padding. This change switches x86 (and x86_64) to
use this value for padding in executable sections, if no linker script directive
overrides it. It also puts the behaviour into place making it easy to change the
behaviour of other targets when desired. I do not know the relevant instruction
sequences for trap instructions on other targets however, so somebody should add
this separately.
Because the old behaviour simply wrote padding in the whole section before
overwriting most of it, this change also modifies the padding algorithm to write
padding only where needed. This in turn has caused a small behaviour change with
regards to what values are written via Fill commands in linker scripts, bringing
it into line with ld.bfd. The fill value is now written starting from the end of
the previous block, which means that it always starts from the first byte of the
fill, whereas the old behaviour meant that the padding sometimes started mid-way
through the fill value. See the test changes for more details.
Reviewed by: ruiu
Differential Revision: https://reviews.llvm.org/D30886
Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32227
llvm-svn: 299635
Symbols referenced by linker scripts are not necessarily be undefined,
so the previous name didn't convey the meaining of the variable.
llvm-svn: 299573
LinkerScript.cpp contains both the linker script processor and the
linker script parser. I put both into a single file, but the file grown
too large, so it's time to put them into two different files.
llvm-svn: 299515
This requires collectign all symbols referenced in the linker script
and adding them to symbol table as undefined symbol.
Differential Revision: https://reviews.llvm.org/D31147
llvm-svn: 298577
LinkerScript used to be a template class, so we couldn't instantiate
that class in elf::link. We instantiated ScriptConfig class earlier
instead so that the linker script parser can store configurations to
the object.
Now that LinkerScript is not a template, it doesn't make sense to
separate ScriptConfig from LinkerScript. This patch merges them.
llvm-svn: 298457
This fixes pr32031 by representing the expressions results as a
SectionBase and offset. This allows us to use an input section
directly instead of getting lost trying to compute an offset in an
outputsection when not all the information is available yet.
This also creates a struct to represent the *value* of and expression,
allowing the expression itself to be a simple typedef. I think this is
easier to read and will make it easier to extend the expression
computation to handle more complicated cases.
llvm-svn: 298079
This also requires postponing the assignment the assignment of
symbols defined in input linker scripts since those can refer to
output sections and in case we don't have a SECTIONS command, we
need to wait until all output sections have been created and
assigned addresses.
Differential Revision: https://reviews.llvm.org/D30851
llvm-svn: 297802
That moves all members that s possible to move for now (all which
does not depend on ELFT templating).
After that change LinkerScript contains only 8 methods in total,
and I believe it is possible to move them all after tweaking other
parts of linker. And we will be able to have single class for
linkerscript at the end.
llvm-svn: 297735
We can move all not templated functionality to LinkerScriptBase.
Patch do that for hasPhdrsCommands() and shows how it helps to detemplate
things in other places.
Probably we should be able to merge these 2 classes into single one after such steps.
Even if not, it still looks as reasonable cleanup for me.
Differential revision: https://reviews.llvm.org/D30895
llvm-svn: 297714
With this we have a single section hierarchy. It is a bit less code,
but the main advantage will be in a future patch being able to handle
foo = symbol_in_obj;
in a linker script. Currently that fails since we try to find the
output section of symbol_in_obj. With this we should be able to just
return an InputSection from the expression.
llvm-svn: 297313
With the current design an InputSection is basically anything that
goes directly in a OutputSection. That includes plain input section
but also synthetic sections, so this should probably not be a
template.
llvm-svn: 295993
Previously we evaluated the values of LMA incorrectly for next cases:
.text : AT(ADDR(.text) - 0xffffffff80000000) { ... }
.data : AT(ADDR(.data) - 0xffffffff80000000) { ... }
.init.begin : AT(ADDR(.init.begin) - 0xffffffff80000000) { ... }
Reason was that we evaluated offset when VA was not assigned. For case above
we ended up with 3 loads that has similar LMA and it was incorrect.
That is critical for linux kernel.
Patch updates the offset after VA calculation. That fixes the issue.
Differential revision: https://reviews.llvm.org/D30163
llvm-svn: 295722
Previously LLD would error out just "ld.lld: error: unable to move location counter backward"
What does not really reveal the place of issue,
Patch adds location to the output.
Differential revision: https://reviews.llvm.org/D30187
llvm-svn: 295720
This case should be possible to handle, but it is hard:
* In order to create program headers correctly, we have to scan the
sections in the order they are in the file.
* To find that order, we have to "execute" the linker script.
* The linker script can contain SIZEOF_HEADERS.
So to support this we have to start with a guess of how many headers
we need (3), run the linker script and try to create the program
headers. If it turns out we need more headers, we run the script again
with a larger SIZEOF_HEADERS.
Also, running the linker script depends on knowing the size of the
sections, so we have to finalize them. But creating the program
headers can change the value stored in some sections, so we have to
split size finalization and content finalization.
Looks like the last part is also needed for range extension thunks, so
we might support this at some point. For now just report an error
instead of producing broken files.
llvm-svn: 295458
As specified here:
* https://sourceware.org/binutils/docs/ld/MEMORY.html#MEMORY
There are two deviations from what is specified for GNU ld:
1. Only integer constants and *not* constant expressions
are allowed in `LENGTH` and `ORIGIN` initializations.
2. The `I` and `L` attributes are *not* implemented.
With (1) there is currently no easy way to evaluate integer
only constant expressions. This can be enhanced in the
future.
With (2) it isn't clear how these flags map to the `SHF_*`
flags or if they even make sense for an ELF linker.
Differential Revision: https://reviews.llvm.org/D28911
llvm-svn: 292875
The feature is documented as
-----------------------------
The format of the dynamic list is the same as the version node
without scope and node name. See *note VERSION:: for more
information.
--------------------------------
And indeed qt uses a dynamic list with an 'extern "C++"' in it. With
this patch we support that
The change to gc-sections-shared makes us match bfd. Just because we
kept bar doesn't mean it has to be in the dynamic symbol table.
The changes to invalid-dynamic-list.test and reproduce.s are because
of the new parser.
The changes to version-script.s are the only case where we change
behavior with regards to bfd, but I would like to see a mix of
--version-script and --dynamic-list used in the wild before
complicating the code.
llvm-svn: 289082