llvm-project/lld/MachO/InputSection.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

108 lines
3.5 KiB
C++
Raw Normal View History

//===- InputSection.cpp ---------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "InputSection.h"
#include "InputFiles.h"
#include "OutputSegment.h"
#include "Symbols.h"
#include "SyntheticSections.h"
#include "Target.h"
#include "Writer.h"
#include "lld/Common/Memory.h"
#include "llvm/Support/Endian.h"
using namespace llvm;
using namespace llvm::MachO;
using namespace llvm::support;
using namespace lld;
using namespace lld::macho;
std::vector<InputSection *> macho::inputSections;
uint64_t InputSection::getFileOffset() const {
return parent->fileOff + outSecFileOff;
}
uint64_t InputSection::getFileSize() const {
return isZeroFill(flags) ? 0 : getSize();
}
uint64_t InputSection::getVA() const { return parent->addr + outSecOff; }
static uint64_t resolveSymbolVA(uint8_t *loc, const Symbol &sym, uint8_t type) {
const RelocAttrs &relocAttrs = target->getRelocAttrs(type);
if (relocAttrs.hasAttr(RelocAttrBits::BRANCH)) {
if (sym.isInStubs())
return in.stubs->addr + sym.stubsIndex * target->stubSize;
[lld-macho] Fix semantics & add tests for ARM64 GOT/TLV relocs I've adjusted the RelocAttrBits to better fit the semantics of the relocations. In particular: 1. *_UNSIGNED relocations are no longer marked with the `TLV` bit, even though they can occur within TLV sections. Instead the `TLV` bit is reserved for relocations that can reference thread-local symbols, and *_UNSIGNED relocations have their own `UNSIGNED` bit. The previous implementation caused TLV and regular UNSIGNED semantics to be conflated, resulting in rebase opcodes being incorrectly emitted for TLV relocations. 2. I've added a new `POINTER` bit to denote non-relaxable GOT relocations. This distinction isn't important on x86 -- the GOT relocations there are either relaxable or non-relaxable loads -- but arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent symbol is in (regardless of whether the symbol ends up in the GOT). This relocation must reference a GOT symbol (so must have the `GOT` bit set) but isn't itself relaxable (so must not have the `LOAD` bit). The `POINTER` bit is used for relocations that *must* reference a GOT slot. 3. A similar situation occurs for TLV relocations. 4. ld64 supports both a pcrel and an absolute version of ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version are pretty weird -- it results in the value of the GOT slot being written, rather than the address. (That means a reference to a dynamically-bound slot will result in zeroes being written.) The programs I've tried linking don't use this form of the relocation, so I've dropped our partial support for it by removing the relevant RelocAttrBits. Reviewed By: alexshap Differential Revision: https://reviews.llvm.org/D97031
2021-02-24 10:41:54 +08:00
} else if (relocAttrs.hasAttr(RelocAttrBits::GOT)) {
if (sym.isInGot())
return in.got->addr + sym.gotIndex * WordSize;
[lld-macho] Fix semantics & add tests for ARM64 GOT/TLV relocs I've adjusted the RelocAttrBits to better fit the semantics of the relocations. In particular: 1. *_UNSIGNED relocations are no longer marked with the `TLV` bit, even though they can occur within TLV sections. Instead the `TLV` bit is reserved for relocations that can reference thread-local symbols, and *_UNSIGNED relocations have their own `UNSIGNED` bit. The previous implementation caused TLV and regular UNSIGNED semantics to be conflated, resulting in rebase opcodes being incorrectly emitted for TLV relocations. 2. I've added a new `POINTER` bit to denote non-relaxable GOT relocations. This distinction isn't important on x86 -- the GOT relocations there are either relaxable or non-relaxable loads -- but arm64 has `GOT_LOAD_PAGE21` which loads the page that the referent symbol is in (regardless of whether the symbol ends up in the GOT). This relocation must reference a GOT symbol (so must have the `GOT` bit set) but isn't itself relaxable (so must not have the `LOAD` bit). The `POINTER` bit is used for relocations that *must* reference a GOT slot. 3. A similar situation occurs for TLV relocations. 4. ld64 supports both a pcrel and an absolute version of ARM64_RELOC_POINTER_TO_GOT. But the semantics of the absolute version are pretty weird -- it results in the value of the GOT slot being written, rather than the address. (That means a reference to a dynamically-bound slot will result in zeroes being written.) The programs I've tried linking don't use this form of the relocation, so I've dropped our partial support for it by removing the relevant RelocAttrBits. Reviewed By: alexshap Differential Revision: https://reviews.llvm.org/D97031
2021-02-24 10:41:54 +08:00
} else if (relocAttrs.hasAttr(RelocAttrBits::TLV)) {
if (sym.isInGot())
return in.tlvPointers->addr + sym.gotIndex * WordSize;
assert(isa<Defined>(&sym));
}
return sym.getVA();
}
void InputSection::writeTo(uint8_t *buf) {
if (getFileSize() == 0)
return;
memcpy(buf, data.data(), data.size());
for (size_t i = 0; i < relocs.size(); i++) {
const Reloc &r = relocs[i];
uint8_t *loc = buf + r.offset;
uint64_t referentVA = 0;
if (target->hasAttr(r.type, RelocAttrBits::SUBTRAHEND)) {
const Symbol *fromSym = r.referent.get<Symbol *>();
const Symbol *toSym = relocs[++i].referent.get<Symbol *>();
referentVA = toSym->getVA() - fromSym->getVA();
} else if (auto *referentSym = r.referent.dyn_cast<Symbol *>()) {
if (target->hasAttr(r.type, RelocAttrBits::LOAD) &&
!referentSym->isInGot())
target->relaxGotLoad(loc, r.type);
referentVA = resolveSymbolVA(loc, *referentSym, r.type);
if (isThreadLocalVariables(flags)) {
// References from thread-local variable sections are treated as offsets
// relative to the start of the thread-local data memory area, which
// is initialized via copying all the TLV data sections (which are all
// contiguous).
if (isa<Defined>(referentSym))
referentVA -= firstTLVDataSection->addr;
}
} else if (auto *referentIsec = r.referent.dyn_cast<InputSection *>()) {
referentVA = referentIsec->getVA();
}
target->relocateOne(loc, r, referentVA, getVA() + r.offset);
}
}
bool macho::isCodeSection(InputSection *isec) {
[lld-macho][nfc] Remove `MachO::` prefix where possible Previously, SyntheticSections.cpp did not have a top-level `using namespace llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would collide with `lld::macho::Symbol`. `MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this name collision in other files where we are only dealing with LLD's own symbols. Along the way, I removed all unnecessary "MachO::" prefixes in our code. Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other header file in the future, we could run into this collision again. Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different namespace. Most of the benefit of `using namespace llvm::MachO` comes from being able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a different (and fully-qualified) namespace like `llvm::tapi` that would solve our problems. Cons: lots of files across llvm-project will need to be updated, and folks who own the TextAPI code need to agree to the name change. Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is ugly. Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is worthwhile, this diff's halfway solution seems good enough to me. Thoughts? Reviewed By: #lld-macho, oontvoo, MaskRay Differential Revision: https://reviews.llvm.org/D98149
2021-03-12 02:28:08 +08:00
uint32_t type = isec->flags & SECTION_TYPE;
if (type != S_REGULAR && type != S_COALESCED)
return false;
[lld-macho][nfc] Remove `MachO::` prefix where possible Previously, SyntheticSections.cpp did not have a top-level `using namespace llvm::MachO` because it caused a naming conflict: `llvm::MachO::Symbol` would collide with `lld::macho::Symbol`. `MachO::Symbol` represents the symbols defined in InterfaceFiles (TBDs). By moving the inclusion of InterfaceFile.h into our .cpp files, we can avoid this name collision in other files where we are only dealing with LLD's own symbols. Along the way, I removed all unnecessary "MachO::" prefixes in our code. Cons of this approach: If TextAPI/MachO/Symbol.h gets included via some other header file in the future, we could run into this collision again. Alternative 1: Have either TextAPI/MachO or BinaryFormat/MachO.h use a different namespace. Most of the benefit of `using namespace llvm::MachO` comes from being able to use things in BinaryFormat/MachO.h conveniently; if TextAPI was under a different (and fully-qualified) namespace like `llvm::tapi` that would solve our problems. Cons: lots of files across llvm-project will need to be updated, and folks who own the TextAPI code need to agree to the name change. Alternative 2: Rename our Symbol to something like `LldSymbol`. I think this is ugly. Personally I think alternative #1 is ideal, but I'm not sure the effort to do it is worthwhile, this diff's halfway solution seems good enough to me. Thoughts? Reviewed By: #lld-macho, oontvoo, MaskRay Differential Revision: https://reviews.llvm.org/D98149
2021-03-12 02:28:08 +08:00
uint32_t attr = isec->flags & SECTION_ATTRIBUTES_USR;
if (attr == S_ATTR_PURE_INSTRUCTIONS)
return true;
if (isec->segname == segment_names::text)
return StringSwitch<bool>(isec->name)
.Cases("__textcoal_nt", "__StaticInit", true)
.Default(false);
return false;
}
std::string lld::toString(const InputSection *isec) {
return (toString(isec->file) + ":(" + isec->name + ")").str();
}