Fixes https://bugs.llvm.org/show_bug.cgi?id=44903
It is about the following case:
```
SECTIONS {
.foo : { *(.foo) } =0x90909090
/DISCARD/ : { *(.bar) }
}
```
Here while parsing the fill expression we treated the
"/" of "/DISCARD/" as operator.
With this change, suggested by Fangrui Song, we do
not allow expressions with operators (e.g. "0x1100 + 0x22")
that are not wrapped into round brackets. It should not
be an issue for users, but helps to resolve parsing ambiguity.
Differential revision: https://reviews.llvm.org/D74687
Hexagon ABI specifies that call x@gdplt is transformed to call __tls_get_addr.
Example:
call x@gdplt
is changed to
call __tls_get_addr
When x is an external tls variable.
Differential Revision: https://reviews.llvm.org/D74443
Any OUTPUT_FORMAT in a linker script overrides the emulation passed on
the command line, so record the passed bfdname and use that in the error
message about incompatible input files.
This prevents confusing error messages. For example, if you explicitly
pass `-m elf_x86_64` to LLD but accidentally include a linker script
which sets `OUTPUT_FORMAT(elf32-i386)`, LLD would previously complain
about your input files being compatible with elf_x86_64, which isn't the
actual issue, and is confusing because the input files are in fact
x86-64 ELF files.
Interestingly enough, this also prevents a segfault! When we don't pass
`-m` and we have an object file which is incompatible with the
`OUTPUT_FORMAT` set by a linker script, the object file is checked for
compatibility before it's added to the objectFiles vector.
config->emulation, objectFiles, and sharedFiles will all be empty, so
we'll attempt to access bitcodeFiles[0], but bitcodeFiles is also empty,
so we'll segfault. This commit prevents the segfault by adding
OUTPUT_FORMAT as a possible source of machine configuration, and it also
adds an llvm_unreachable to diagnose similar issues in the future.
Differential Revision: https://reviews.llvm.org/D76109
On an internal target,
* Before D74773: time -f '%M' => 18275680
* After D74773: time -f '%M' => 22088964
This patch restores to the status before D74773.
-M output can be useful when diagnosing an "error: output file too large" problem (emitted in openFile()).
I just ran into such a situation where I had to debug an erronerous
Linux kernel linker script. It tried to create a file larger than
INT64_MAX bytes.
This patch could have helped https://bugs.llvm.org/show_bug.cgi?id=44715 as well.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D75966
See `docs/ELF/linker_script.rst` for the new computation for sh_addr and sh_addralign.
`ALIGN(section_align)` now means: "increase alignment to section_align"
(like yet another input section requirement).
The "start of section .foo changes from 0x11 to 0x20" warning no longer
makes sense. Change it to warn if sh_addr%sh_addralign!=0.
To decrease the alignment from the default max_input_align,
use `.output ALIGN(8) : {}` instead of `.output : ALIGN(8) {}`
See linkerscript/section-address-align.test as an example.
When both an output section address and ALIGN are set (can be seen as an
"undefined behavior" https://sourceware.org/ml/binutils/2020-03/msg00115.html),
lld may align more than GNU ld, but it makes a linker script working
with GNU ld hard to break with lld.
This patch can be considered as restoring part of the behavior before D74736.
Differential Revision: https://reviews.llvm.org/D75724
Summary:
Places orphan sections into a unique output section. This prevents the merging of orphan sections of the same name.
Matches behaviour of GNU ld --unique. --unique=pattern is not implemented.
Motivated user case shown in the test has 2 local symbols as they would appear if C++ source has been compiled with -ffunction-sections. The merging of these sections in the case of a partial link (-r) may limit the effectiveness of -gc-sections of a subsequent link.
Reviewers: espindola, jhenderson, bd1976llvm, edd, andrewng, JonChesterfield, MaskRay, grimar, ruiu, psmith
Reviewed By: MaskRay, grimar
Subscribers: emaste, arichardson, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75536
```
createFiles(args)
readDefsym
readerLinkerScript(*mb)
...
readMemory
readMemoryAssignment("ORIGIN", "org", "o") // eagerly evaluated
target = getTarget();
link(args)
writeResult<ELFT>()
...
finalizeSections()
script->processSymbolAssignments()
addSymbol(cmd) // with this patch, evaluated here
```
readMemoryAssignment eagerly evaluates ORIGIN/LENGTH and returns an uint64_t.
This patch postpones the evaluation to make
* --defsym and symbol assignments
* `CONSTANT(COMMONPAGESIZE)` (requires a non-null `lld:🧝:target`)
work. If the expression somehow requires interaction with memory
regions, the circular dependency may cause the expression to evaluate to
a strange value. See the new test added to memory-err.s
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D75763
Added a write method for TimeTrace that takes two strings representing
file names. The first is any file name that may have been provided by the
user via `time-trace-file` flag, and the second is a fallback that should
be configured by the caller. This method makes it cleaner to write the
trace output because there is no longer a need to check file names at the
caller and simplifies future TimeTrace usages.
Reviewed By: modocache
Differential Revision: https://reviews.llvm.org/D74514
llvm::call_once(initDwarfLine, [this]() { initializeDwarf(); });
Though it is not used in all places.
I need that patch for implementing "Remove obsolete debug info" feature
(D74169). But this caching mechanism is useful by itself, and I think it
would be good to use it without connection to "Remove obsolete debug info"
feature. So this patch changes inplace creation of DWARFContext with
its cached version.
Depends on D74308
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D74773
* Delete boilerplate
* Change functions to return `Error`
* Test parsing errors
* Update callers of ARMAttributeParser::parse() to check the `Error` return value.
Since this patch touches nearly everything in the file, I apply
http://llvm.org/docs/Proposals/VariableNames.html and change variable
names to lower case.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D75015
Summary:
LLD has workaround for the times when SectionIndex was not passed properly:
LT->getFileLineInfoForAddress(
S->getOffsetInFile() + Offset, nullptr,
DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath, Info));
S->getOffsetInFile() was added to differentiate offsets between
various sections. Now SectionIndex is properly specified.
Thus it is not necessary to use getOffsetInFile() workaround.
See https://reviews.llvm.org/D58194, https://reviews.llvm.org/D58357.
This patch removes getOffsetInFile() workaround.
Reviewers: ruiu, grimar, MaskRay, espindola
Reviewed By: grimar, MaskRay
Subscribers: emaste, arichardson, llvm-commits
Tags: #llvm, #lld
Differential Revision: https://reviews.llvm.org/D75636
Similar to D63182 [ELF][PPC64] Don't report "relocation refers to a discarded section" for .toc
Reviewed By: Bdragon28
Differential Revision: https://reviews.llvm.org/D75419
Summary:
Extracted from D74773. Currently, errors happened while parsing
debug info are reported as errors. DebugInfoDWARF library treats such
errors as "Recoverable errors". This patch makes debug info errors
to be reported as warnings, to support DebugInfoDWARF approach.
Reviewers: ruiu, grimar, MaskRay, jhenderson, espindola
Reviewed By: MaskRay, jhenderson
Subscribers: emaste, aprantl, arichardson, arphaman, llvm-commits
Tags: #llvm, #debug-info, #lld
Differential Revision: https://reviews.llvm.org/D75234
MC will now output the R_ARM_THM_PC8, R_ARM_THM_PC12 and
R_ARM_THM_PREL_11_0 relocations. These are short-ranged relocations that
are used to implement the adr rd, literal and ldr rd, literal pseudo
instructions.
The instructions use a new RelExpr called R_ARM_PCA in order to calculate
the required S + A - Pa expression, where Pa is AlignDown(P, 4) as the
instructions add their immediate to AlignDown(PC, 4). We also do not want
these relocations to generate or resolve against a PLT entry as the range
of these relocations is so short they would never reach.
The R_ARM_THM_PC8 has a special encoding convention for the relocation
addend, the immediate field is unsigned, yet the addend must be -4 to
account for the Thumb PC bias. The ABI (not the architecture) uses the
convention that the 8-byte immediate of 0xff represents -4.
Differential Revision: https://reviews.llvm.org/D75042
They are purposefully skipped by input section descriptions (rL295324).
Similarly, --orphan-handling= should not warn/error for them.
This behavior matches GNU ld.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D75151
This makes --orphan-handling= less noisy.
This change also improves our compatibility with GNU ld.
GNU ld special cases .symtab, .strtab and .shstrtab . We need output section
descriptions for .symtab, .strtab and .shstrtab to suppress:
<internal>:(.symtab) is being placed in '.symtab'
<internal>:(.shstrtab) is being placed in '.shstrtab'
<internal>:(.strtab) is being placed in '.strtab'
With --strip-all, .symtab and .strtab can be omitted (note, --strip-all is not compatible with --emit-relocs).
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D75149
With this --shuffle-sections=seed produces the same result in every
host.
Reviewed By: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D74971
When the output section address (addrExpr) is specified, GNU ld warns if
sh_addr is different. This patch implements the warning.
Note, LinkerScript::assignAddresses can be called more than once. We
need to record the changed section addresses, and only report the
warnings after the addresses are finalized.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D74741
Follow-up for D74286.
Notations:
* alignExpr: the computed ALIGN value
* max_input_align: the maximum of input section alignments
This patch changes the following two cases to match GNU ld:
* When ALIGN is present, GNU ld sets output sh_addr to alignExpr, while lld use max(alignExpr, max_input_align)
* When addrExpr is specified but alignExpr is not, GNU ld sets output sh_addr to addrExpr, while lld uses `advance(0, max_input_align)`
Note, sh_addralign is still set to max(alignExpr, max_input_align).
lma-align.test is enhanced a bit to check we don't overalign sh_addr.
fixSectionAlignments() sets addrExpr but not alignExpr for the `!hasSectionsCommand` case.
This patch sets alignExpr as well so that max_input_align will be respected.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D74736
Summary:
This option causes lld to shuffle sections by assigning different
priorities in each run.
The use case for this is to introduce randomization in benchmarks. The
idea is inspired by the paper "Producing Wrong Data Without Doing
Anything Obviously Wrong!"
(https://www.inf.usi.ch/faculty/hauswirth/publications/asplos09.pdf). Unlike
the paper, we shuffle individual sections, not just input files.
Doing this in lld is particularly convenient as the --reproduce option
makes it easy to collect all the necessary bits for relinking the
program being benchmarked. Once that it is done, all that is needed is
to add --shuffle-sections=0 to the response file and relink before each
run of the benchmark.
Differential Revision: https://reviews.llvm.org/D74791
With this patch lld recognizes ARM SBREL relocations.
R_ARM*_MOVW_BREL relocations are not tested because they are not used.
Patch by Tamas Petz
Differential Revision: https://reviews.llvm.org/D74604
Summary:
Generate PAC protected plt only when "-z pac-plt" is passed to the
linker. GNU toolchain generates when it is explicitly requested[1].
When pac-plt is requested then set the GNU_PROPERTY_AARCH64_FEATURE_1_PAC
note even when not all function compiled with PAC but issue a warning.
Harmonizing the warning style for BTI/PAC/IBT.
Generate BTI protected PLT if case of "-z force-bti".
[1] https://www.sourceware.org/ml/binutils/2019-03/msg00021.html
Reviewers: peter.smith, espindola, MaskRay, grimar
Reviewed By: peter.smith, MaskRay
Subscribers: tatyana-krasnukha, emaste, arichardson, kristof.beyls, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74537
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
Fixes https://bugs.llvm.org//show_bug.cgi?id=44878
When --strip-debug is specified, .debug* are removed from inputSections
while .rel[a].debug* (incorrectly) remain.
LinkerScript::addOrphanSections() requires the output section of a relocated
InputSectionBase to be created first.
.debug* are not in inputSections ->
output sections .debug* are not created ->
getOutputSectionName(.rel[a].debug*) dereferences a null pointer.
Fix the null pointer dereference by deleting .rel[a].debug* from inputSections as well.
Reviewed By: grimar, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D74510
Recommit of 0b4a047bfb
(reverted in c29003813a) to incorporate
subsequent fix and add a warning when LLD's interworking behavior has
changed.
D73474 disabled the generation of interworking thunks for branch
relocations to non STT_FUNC symbols. This patch handles the case of BL and
BLX instructions to non STT_FUNC symbols. LLD would normally look at the
state of the caller and the callee and write a BL if the states are the
same and a BLX if the states are different.
This patch disables BL/BLX substitution when the destination symbol does
not have type STT_FUNC. This brings our behavior in line with GNU ld which
may prevent difficult to diagnose runtime errors when switching to lld.
This change does change how LLD handles interworking of symbols that do not
have type STT_FUNC from previous versions including the 10.0 release. This
brings LLD in line with ld.bfd but there may be programs that have not been
linked with ld.bfd that depend on LLD's previous behavior. We emit a warning
when the behavior changes.
A summary of the difference between 10.0 and 11.0 is that for symbols
that do not have a type of STT_FUNC LLD will not change a BL to a BLX or
vice versa. The table below enumerates the changes
| relocation | STT_FUNC | bit(0) | in | 10.0- out | 11.0+ out |
| R_ARM_CALL | no | 1 | BL | BLX | BL |
| R_ARM_CALL | no | 0 | BLX | BL | BLX |
| R_ARM_THM_CALL | no | 1 | BLX | BL | BLX |
| R_ARM_THM_CALL | no | 0 | BL | BLX | BL |
Differential Revision: https://reviews.llvm.org/D73542
D43468+D44380 added INSERT [AFTER|BEFORE] for non-orphan sections. This patch
makes INSERT work for orphan sections as well.
`SECTIONS {...} INSERT [AFTER|BEFORE] .foo` does not set `hasSectionCommands`, so the result
will be similar to a regular link without a linker script. The differences when `hasSectionCommands` is set include:
* image base is different
* -z noseparate-code/-z noseparate-loadable-segments are unavailable
* some special symbols such as `_end _etext _edata` are not defined
The behavior is similar to GNU ld:
INSERT is not considered an external linker script.
This feature makes the section layout more flexible. It can be used to:
* Place .nv_fatbin before other readonly SHT_PROGBITS sections to mitigate relocation overflows.
* Disturb the layout to expose address sensitive application bugs.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D74375
GNU ld has a counterintuitive lang_propagate_lma_regions rule.
```
// .foo's LMA region is propagated to .bar because their VMA region is the same,
// and .bar does not have an explicit output section address (addr_tree).
.foo : { *(.foo) } >RAM AT> FLASH
.bar : { *(.bar) } >RAM
// An explicit output section address disables propagation.
.foo : { *(.foo) } >RAM AT> FLASH
.bar . : { *(.bar) } >RAM
```
In both cases, lld thinks .foo's LMA region is propagated and
places .bar in the same PT_LOAD, so lld diverges from GNU ld w.r.t. the
second case (lma-align.test).
This patch changes Writer<ELFT>::createPhdrs to disable propagation
(start a new PT_LOAD). A user of the first case can make linker scripts
portable by explicitly specifying `AT>`. By contrast, there was no
workaround for the old behavior.
This change uncovers another LMA related bug in assignOffsets() where
`ctx->lmaOffset = 0;` was omitted. It caused a spurious "load address
range overlaps" error for at2.test
The new PT_LOAD rule is complex. For convenience, I listed the origins of some subexpressions:
* rL323449: `sec->memRegion == load->firstSec->memRegion`; linkerscript/at3.test
* D43284: `load->lastSec == Out::programHeaders` (don't start a new PT_LOAD after program headers); linkerscript/at4.test
* D58892: `sec != relroEnd` (start a new PT_LOAD after PT_GNU_RELRO)
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D74297
When lmaRegion is non-null, respect `sec->alignment`
This rule is analogous to `switchTo(sec)` which advances sh_addr (VMA).
This fixes the p_paddr misalignment issue as reported by
https://android-review.googlesource.com/c/trusty/external/trusted-firmware-a/+/1230058
Note, `sec->alignment` is the maximum of ALIGN and input section alignments. We may overalign LMA than GNU ld.
linkerscript/align-lma.s has a FIXME that demonstrates another bug:
`.bss ... >RAM` should be placed in a different PT_LOAD (GNU ld
behavior) because its lmaRegion (nullptr) is different from the previous
section's lmaRegion (ROM).
Reviewed By: psmith
Differential Revision: https://reviews.llvm.org/D74286
There are still problems after the fix in
"[ELF][ARM] Fix regression of BL->BLX substitution after D73542"
so let's revert to get trunk back to green while we investigate.
See https://reviews.llvm.org/D73542
This reverts commit 5461fa2b1f.
This reverts commit 0b4a047bfb.
This adds some of LLD specific scopes and picks up optimisation scopes
via LTO/ThinLTO. Makes use of TimeProfiler multi-thread support added in
77e6bb3c.
Differential Revision: https://reviews.llvm.org/D71060
D73542 made a typo (`rel.type == R_PLT_PC`; should be `rel.expr`) and introduced a regression:
BL->BLX substitution was disabled when the target symbol is preemptible
(expr is R_PLT_PC).
The two added bl instructions in arm-thumb-interwork-shared.s check that
we patch BL to BLX.
Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1047531
For an out-of-range relocation referencing a non-local symbol, report the symbol name and the object file that defines the symbol. As an example:
```
t.o:(function func: .text.func+0x3): relocation R_X86_64_32S out of range: -281474974609120 is not in [-2147483648, 2147483647]
```
=>
```
t.o:(function func: .text.func+0x3): relocation R_X86_64_32S out of range: -281474974609120 is not in [-2147483648, 2147483647]; references func
>>> defined in t1.o
```
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D73518
D73474 disabled the generation of interworking thunks for branch
relocations to non STT_FUNC symbols. This patch handles the case of BL and
BLX instructions to non STT_FUNC symbols. LLD would normally look at the
state of the caller and the callee and write a BL if the states are the
same and a BLX if the states are different.
This patch disables BL/BLX substitution when the destination symbol does
not have type STT_FUNC. This brings our behavior in line with GNU ld which
may prevent difficult to diagnose runtime errors when switching to lld.
Differential Revision: https://reviews.llvm.org/D73542
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Similar to R_MIPS_GPREL16 and R_MIPS_GPREL32 (D45972).
If the addend of an R_PPC_PLTREL24 is >= 0x8000, it indicates that r30
is relative to the input section .got2.
```
addis 30, 30, .got2+0x8000-.L1$pb@ha
addi 30, 30, .got2+0x8000-.L1$pb@l
...
bl foo+0x8000@PLT
```
After linking, the relocation will be relative to the output section .got2.
To compensate for the shift `address(input section .got2) - address(output section .got2) = ppc32Got2OutSecOff`, adjust by `ppc32Got2OutSecOff`:
```
addis 30, 30, .got2+0x8000-.L1+ppc32Got2OutSecOff$pb@ha
addi 30, 30, .got2+0x8000-.L1+ppc32Got2OutSecOff$pb@ha$pb@l
...
bl foo+0x8000+ppc32Got2OutSecOff@PLT
```
This rule applys to a relocatable link or a non-relocatable link with --emit-relocs.
Reviewed By: Bdragon28
Differential Revision: https://reviews.llvm.org/D73532
ELF for the ARM architecture requires linkers to provide
interworking for symbols that are of type STT_FUNC. Interworking for
other symbols must be encoded directly in the object file. LLD was always
providing interworking, regardless of the symbol type, this breaks some
programs that have branches from Thumb state targeting STT_NOTYPE symbols
that have bit 0 clear, but they are in fact internal labels in a Thumb
function. LLD treats these symbols as ARM and inserts a transition to Arm.
This fixes the problem for in range branches, R_ARM_JUMP24,
R_ARM_THM_JUMP24 and R_ARM_THM_JUMP19. This is expected to be the vast
majority of problem cases as branching to an internal label close to the
function.
There is at least one follow up patch required.
- R_ARM_CALL and R_ARM_THM_CALL may do interworking via BL/BLX
substitution.
In theory range-extension thunks can be altered to not change state when
the symbol type is not STT_FUNC. I will need to check with ld.bfd to see if
this is the case in practice.
Fixes (part of) https://github.com/ClangBuiltLinux/linux/issues/773
Differential Revision: https://reviews.llvm.org/D73474