MinGW uses these kind of list terminator symbols for traversing
the constructor/destructor lists. These list terminators are
actual pointers entries in the lists, with the values 0 and
(uintptr_t)-1 (instead of just symbols pointing to the start/end
of the list).
(This mechanism exists in both the mingw-w64 crt startup code and
in libgcc; normally the mingw-w64 one is used, but a DLL build of
libgcc uses the libgcc one. Therefore it's not trivial to change
the mechanism without lots of cross-project synchronization and
potentially invalidating some combinations of old/new versions
of them.)
When mingw-w64 has been used with lld so far, the CRT startup object
files have so far provided these symbols, ending up with different,
incompatible builds of the CRT startup object files depending on
whether binutils or lld are going to be used.
In order to avoid the need of different configuration of the CRT startup
object files depending on what linker to be used, provide these symbols
in lld instead. (Mingw-w64 checks at build time whether the linker
provides these symbols or not.) This unifies this particular detail
between the two linkers.
This does disallow the use of the very latest lld with older versions
of mingw-w64 (the configure check for the list was added recently;
earlier it simply checked whether the CRT was built with gcc or clang),
and requires rebuilding the mingw-w64 CRT. But the number of users of
lld+mingw still is low enough that such a change should be tolerable,
and unifies this aspect of the toolchains, easing interoperability
between the toolchains for the future.
The actual test for this feature is added in ctors_dtors_priority.s,
but a number of other tests that checked absolute output addresses
are updated.
Differential Revision: https://reviews.llvm.org/D52053
llvm-svn: 342294
After fixing up the runtime pseudo relocation, the .refptr.<var>
will be a plain pointer with the same value as the IAT entry itself.
To save a little binary size and reduce the number of runtime pseudo
relocations, redirect references to the IAT entry (via the __imp_<var>
symbol) itself and discard the .refptr.<var> chunk (as long as the
same section chunk doesn't contain anything else than the single
pointer).
As there are now cases for both setting the Live variable to true
and false externally, remove the accessors and setters and just make
the variable public instead.
Differential Revision: https://reviews.llvm.org/D51456
llvm-svn: 341175
Normally, in order to reference exported data symbols from a different
DLL, the declarations need to have the dllimport attribute, in order to
use the __imp_<var> symbol (which contains an address to the actual
variable) instead of the variable itself directly. This isn't an issue
in the same way for functions, since any reference to the function without
the dllimport attribute will end up as a reference to a thunk which loads
the actual target function from the import address table (IAT).
GNU ld, in MinGW environments, supports automatically importing data
symbols from DLLs, even if the references didn't have the appropriate
dllimport attribute. Since the PE/COFF format doesn't support the kind
of relocations that this would require, the MinGW's CRT startup code
has an custom framework of their own for manually fixing the missing
relocations once module is loaded and the target addresses in the IAT
are known.
For this to work, the linker (originall in GNU ld) creates a list of
remaining references needing fixup, which the runtime processes on
startup before handing over control to user code.
While this feature is rather controversial, it's one of the main features
allowing unix style libraries to be used on windows without any extra
porting effort.
Some sort of automatic fixing of data imports is also necessary for the
itanium C++ ABI on windows (as clang implements it right now) for importing
vtable pointers in certain cases, see D43184 for some discussion on that.
The runtime pseudo relocation handler supports 8/16/32/64 bit addresses,
either PC relative references (like IMAGE_REL_*_REL32*) or absolute
references (IMAGE_REL_AMD64_ADDR32, IMAGE_REL_AMD64_ADDR32,
IMAGE_REL_I386_DIR32). On linking, the relocation is handled as a
relocation against the corresponding IAT slot. For the absolute references,
a normal base relocation is created, to update the embedded address
in case the image is loaded at a different address.
The list of runtime pseudo relocations contains the RVA of the
imported symbol (the IAT slot), the RVA of the location the relocation
should be applied to, and a size of the memory location. When the
relocations are fixed at runtime, the difference between the actual
IAT slot value and the IAT slot address is added to the reference,
doing the right thing for both absolute and relative references.
With this patch alone, things work fine for i386 binaries, and mostly
for x86_64 binaries, with feature parity with GNU ld. Despite this,
there are a few gotchas:
- References to data from within code works fine on both x86 architectures,
since their relocations consist of plain 32 or 64 bit absolute/relative
references. On ARM and AArch64, references to data doesn't consist of
a plain 32 or 64 bit embedded address or offset in the code. On ARMNT,
it's usually a MOVW+MOVT instruction pair represented by a
IMAGE_REL_ARM_MOV32T relocation, each instruction containing 16 bit of
the target address), on AArch64, it's usually an ADRP+ADD/LDR/STR
instruction pair with an even more complex encoding, storing a PC
relative address (with a range of +/- 4 GB). This could theoretically
be remedied by extending the runtime pseudo relocation handler with new
relocation types, to support these instruction encodings. This isn't an
issue for GCC/GNU ld since they don't support windows on ARMNT/AArch64.
- For x86_64, if references in code are encoded as 32 bit PC relative
offsets, the runtime relocation will fail if the target turns out to be
out of range for a 32 bit offset.
- Fixing up the relocations at runtime requires making sections writable
if necessary, with the VirtualProtect function. In Windows Store/UWP apps,
this function is forbidden.
These limitations are addressed by a few later patches in lld and
llvm.
Differential Revision: https://reviews.llvm.org/D50917
llvm-svn: 340726
For this relocation, which applies to two consecutive instructions,
it's plausible that the second instruction might not actually be
the right one.
Differential Revision: https://reviews.llvm.org/D50998
llvm-svn: 340715
In most of these cases, it's easy to go on despite the error,
printing as many valuable error messages as possible from one run
as possible.
Differential Revision: https://reviews.llvm.org/D51087
llvm-svn: 340399
Summary:
When reporting an unsupported relocation type, let's also report the
file we encountered it in to aid diagnosis.
Reviewers: ruiu, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45911
llvm-svn: 334154
Now only IMAGE_REL_ARM64_ABSOLUTE and IMAGE_REL_ARM64_TOKEN
are unhandled.
Also add range checks for the existing BRANCH26 relocation.
Differential Revision: https://reviews.llvm.org/D46354
llvm-svn: 331505
Summary:
ubsan found that we sometimes pass nullptr to memcpy in
SectionChunk::writeTo(). This change adds a check that avoids that.
Reviewers: ruiu
Reviewed By: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45789
llvm-svn: 330490
In an upcoming change I will need to make a distinction between section
type (code, data, bss) and permissions. The term that I use for both
of these things is "output characteristics".
Differential Revision: https://reviews.llvm.org/D45799
llvm-svn: 330361
In COFF, duplicate string literals are merged by placing them in a
comdat whose leader symbol name contains a specific prefix followed
by the hash and partial contents of the string literal. This gives
us an easy way to identify sections containing string literals in
the linker: check for leader symbol names with the given prefix.
Any sections that are identified in this way as containing string
literals may be tail merged. We do so using the StringTableBuilder
class, which is also used to tail merge string literals in the ELF
linker. Tail merging is enabled only if ICF is enabled, as this
provides a signal as to whether the user cares about binary size.
Differential Revision: https://reviews.llvm.org/D44504
llvm-svn: 327668
Summary:
This patch adds some initial support for Windows control flow guard. At
the end of the day, the linker needs to synthesize a table of RVAs very
similar to the structured exception handler table (/safeseh).
Both /safeseh and /guard:cf take sections of symbol table indices
(.sxdata and .gfids$y) and turn them into RVA tables referenced by the
load config struct in the CRT through special symbols.
Reviewers: ruiu, amccarth
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42592
llvm-svn: 324306
This is similar to what was added in SVN r277838 for 24 bit
branch instructions.
Differential Revision: https://reviews.llvm.org/D41163
llvm-svn: 320677
If /debug was not specified, readSection will return a null
pointer for debug sections. If the debug section is associative with
another section, we need to make sure that the section returned from
readSection is not a null pointer before adding it as an associative
section.
Differential Revision: https://reviews.llvm.org/D40533
llvm-svn: 319133
With this change, instead of creating a SectionChunk for each section
in the object file, we only create them when we encounter a prevailing
comdat section.
Also change how symbol resolution occurs between comdat symbols. Now
only the comdat leader participates in comdat resolution, and not any
other external associated symbols. This is more in line with how COFF
semantics are defined, and should allow for a more straightforward
implementation of non-ANY comdat types.
On my machine, this change reduces our runtime linking a release
build of chrome_child.dll with /nopdb from 5.65s to 4.54s (median of
50 runs).
Differential Revision: https://reviews.llvm.org/D40238
llvm-svn: 319090
Don't crash if we encounter a reference to an early discarded section
(such as .drectve). Instead, handle them the same way as sections
discarded by comdat merging, i.e. either print an error message or
(for debug sections) silently ignore the relocation.
Differential Revision: https://reviews.llvm.org/D40235
llvm-svn: 318689
I never ran into this until lld-link started enabling debug output
by default for the mingw mode. I haven't been able to verify that
this actually behaves correctly, but this relocation is handled
identically on all other architectures so far.
Differential Revision: https://reviews.llvm.org/D39754
llvm-svn: 317669
Now that we have only SymbolBody as the symbol class. So, "SymbolBody"
is a bit strange name now. This is a mechanical change generated by
perl -i -pe s/SymbolBody/Symbol/g $(git grep -l SymbolBody lld/ELF lld/COFF)
nd clang-format-diff.
Differential Revision: https://reviews.llvm.org/D39459
llvm-svn: 317370
Summary:
The COFF linker and the ELF linker have long had similar but separate
Error.h and Error.cpp files to implement error handling. This change
introduces new error handling code in Common/ErrorHandler.h, changes the
COFF and ELF linkers to use it, and removes the old, separate
implementations.
Reviewers: ruiu
Reviewed By: ruiu
Subscribers: smeenai, jyknight, emaste, sdardis, nemanjai, nhaehnle, mgorny, javed.absar, kbarton, fedor.sergeev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39259
llvm-svn: 316624
This is implemented in the same way as the other ADDR32NB relocations
for ARM and X64.
Differential Revision: https://reviews.llvm.org/D38815
llvm-svn: 315561
According to Microsoft's PE/COFF documentation, a SECREL relocation is
"The 32-bit offset of the target from the beginning of its section". By
my reading, the "from the beginning of its section" implies that the
offset is unsigned.
Change from an assertion to an error, since it's possible to trigger
this condition normally for input files with very large sections, and we
should fail gracefully for those instead of asserting.
Differential Revision: https://reviews.llvm.org/D38020
llvm-svn: 313703
These are emitted for comm symbols in object files, when targeting
a GNU environment.
Alternatively, just ignore them since we already align CommonChunk
to the natural size of the content (up to 32 bytes). That would only
trade away the possibility to overalign small symbols, which doesn't
sound like something that might not need to be handled?
Differential Revision: https://reviews.llvm.org/D36304
llvm-svn: 310871
Also handle overflow correctly in LDR/STR relocations. Even if the
offset range of a 8 byte LDR instruction is 15 bit (even if the immediate
itself is 12 bit) due to a 3 bit shift, only include up to 12 bits of offset
after doing the relocation, by limiting the range of the immediate by the
number of shifted bits.
Differential Revision: https://reviews.llvm.org/D35792
llvm-svn: 309175
Also extend the tests for IMAGE_REL_ARM64_PAGEOFFSET_12L to test
all 8/16/32/64 bit GPR and 8/16/32/64/128 SIMD/FP bit ldr/str variants,
and a ldr with an existing offset.
Differential revision: https://reviews.llvm.org/D35647
llvm-svn: 308631
Fix issues found in existing code, while reviewing other changes.
Change the data type of a variable to uint32_t, to avoid potential issues
with signedness in shifts.
Differential revision: https://reviews.llvm.org/D35646
llvm-svn: 308586
This fixes cases on ARM64 when importing from more than one DLL,
in case the imports from the first DLL ended up unaligned.
When fixing up a IMAGE_REL_ARM64_PAGEOFFSET_12L, which shifts the
offset by the load/store size, check that the shift doesn't discard
any bits. (This would also detect if the import address chunks were
unaligned.)
Differential revision: https://reviews.llvm.org/D35640
llvm-svn: 308585
DWARF debug sections can also contain relocations against symbols in
discared segments. LLD should accept such relocations.
Differential Revision: https://reviews.llvm.org/D35526
llvm-svn: 308315
Summary:
This would have caught the invalid object file I used in my test case in
r307726. The OOB was only caught by ASan later, which is slow and
doesn't work on some platforms. LLD should do some basic input
validation itself. This check isn't perfect, so relocations can reach
OOB by up to seven bytes, but it's better than what we had and probably
cheap.
Reviewers: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35371
llvm-svn: 307948
This is enough to link a working hello world executable, with
a call to an imported function, a string constant passed to
the imported function, and loads from a global variable.
Differential Revision: https://reviews.llvm.org/D34964
llvm-svn: 307629
Summary:
In order to do this without switching on the symbol kind multiple times,
I created Defined::getChunkAndOffset and use that instead of
SymbolBody::getRVA in the inner relocation loop.
Now we get the symbol's chunk before switching over relocation types, so
we can test if it has been discarded outside the inner relocation type
switch. This also simplifies application of section relative
relocations. Previously we would switch on symbol kind to compute the
RVA, then the relocation type, and then the symbol kind again to get the
output section so we could subtract that from the symbol RVA. Now we
*always* have an OutputSection, so applying SECREL and SECTION
relocations isn't as much of a special case.
I'm still not quite happy with the cleanliness of this code. I'm not
sure what offsets and bases we should be using during the relocation
processing loop: VA, RVA, or OutputSectionOffset.
Reviewers: ruiu, pcc
Reviewed By: ruiu
Subscribers: majnemer, inglorion, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D34650
llvm-svn: 306566
Summary:
For SECTION relocations against absolute symbols, MSVC emits the largest
output section index plus one. I've implemented that by threading a
global variable through DefinedAbsolute that is filled in by the Writer.
A more library-oriented approach would be to thread the Writer through
Chunk::writeTo and SectionChunk::applyRel*, but Rui seems to prefer
doing it this way.
MSVC rejects SECREL relocations against absolute symbols, but only when
the relocation is in a real output section. When the relocation is in a
CodeView debug info section destined for the PDB, it seems that this
relocation error is suppressed, and absolute symbols become zeros in the
object file. This is easily implemented by checking the input section
from which we're applying relocations.
This should fix errors about __safe_se_handler_table and
__guard_fids_table when linking the CRT and generating a PDB.
Reviewers: ruiu
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D34541
llvm-svn: 306071
Summary:
Adds a "Discarded" bool to SectionChunk to indicate if the section was
discarded by COMDAT deduplication. The Writer still just checks
`isLive()`.
Fixes PR33446
Reviewers: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34288
llvm-svn: 305582
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.
Differential Revision: https://reviews.llvm.org/D33843
llvm-svn: 304864
I thought I fixed the page size, but there were still errors.
This patch also contains fixes for grammatical errors.
Thanks pcc for proofreading!
llvm-svn: 301454