llvm-project

Commit Graph

Author	SHA1	Message	Date
Sriraman Tallam	e0bca46b08	Options for Basic Block Sections, enabled in D68063 and D73674. This patch adds clang options: -fbasic-block-sections={all,<filename>,labels,none} and -funique-basic-block-section-names. LLVM Support for basic block sections is already enabled. + -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic block sections for all or a subset of basic blocks. "labels" only enables basic block symbols. + -funique-basic-block-section-names: Enables unique section names for basic block sections, disabled by default. Differential Revision: https://reviews.llvm.org/D68049	2020-06-02 00:23:32 -07:00
Fangrui Song	751f18e7d4	[ELF] Refine --export-dynamic-symbol semantics to be compatible GNU ld 2.35 GNU ld from binutils 2.35 onwards will likely support --export-dynamic-symbol but with different semantics. https://sourceware.org/pipermail/binutils/2020-May/111302.html Differences: 1. -export-dynamic-symbol is not supported 2. --export-dynamic-symbol takes a glob argument 3. --export-dynamic-symbol can suppress binding the references to the definition within the shared object if (-Bsymbolic or -Bsymbolic-functions) 4. --export-dynamic-symbol does not imply -u I don't think the first three points can affect any user. For the fourth point, Not implying -u can lead to some archive members unfetched. Add -u foo to restore the previous behavior. Exact semantics: * -no-pie or -pie: matched non-local defined symbols will be added to the dynamic symbol table. * -shared: matched non-local STV_DEFAULT symbols will not be bound to definitions within the shared object even if they would otherwise be due to -Bsymbolic, -Bsymbolic-functions, or --dynamic-list. Reviewed By: psmith Differential Revision: https://reviews.llvm.org/D80487	2020-06-01 11:30:03 -07:00
Fangrui Song	b912b887d8	[ELF] Add --print-archive-stats= gold has an option --print-symbol-counts= which prints: // For each archive archive $archive $members $fetched_members // For each object file symbols $object $defined_symbols $used_defined_symbols In most cases, `$defined_symbols = $used_defined_symbols` unless weak symbols are present. Strangely `$used_defined_symbols` includes symbols defined relative to --gc-sections discarded sections. The `symbols` lines do not appear to be useful. `archive` lines are useful: `$fetched_members=0` lines correspond to unused archives. The information can be used to trim dependencies. This patch implements --print-archive-stats= which prints the number of members and the number of fetched members for each archive. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D78983	2020-04-29 18:04:37 -07:00
Hongtao Yu	964ef8eecc	[lld] Support --lto-emit-asm and --plugin-opt=emit-asm Summary: The switch --plugin-opt=emit-asm can be used with the gold linker to dump the final assembly code generated by LTO in a user-friendly way. Unfortunately it doesn't work with lld. I'm hooking it up with lld. With that switch, lld emits assembly code into the output file (specified by -o) and if there are multiple input files, each of their assembly code will be emitted into a separate file named by suffixing the output file name with a unique number, respectively. The linking then stops after generating those assembly files. Reviewers: espindola, wenlei, tejohnson, MaskRay, grimar Reviewed By: tejohnson, MaskRay, grimar Subscribers: pcc, emaste, inglorion, arichardson, hiraditya, MaskRay, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77231	2020-04-27 11:00:46 -07:00
Fangrui Song	232578804a	[ELF] Add --warn-backrefs-exclude=<glob> D77522 changed --warn-backrefs to not warn for linking sandwich problems (-ldef1 -lref -ldef2). This removed lots of false positives. However, glibc still has some problems. libc.a defines some symbols which are normally in libm.a and libpthread.a, e.g. __isnanl/raise. For a linking order `-lm -lpthread -lc`, I have seen: ``` // different resolutions: GNU ld/gold select libc.a(s_isnan.o) as the definition backward reference detected: __isnanl in libc.a(printf_fp.o) refers to libm.a(m_isnanl.o) // different resolutions: GNU ld/gold select libc.a(raise.o) as the definition backward reference detected: raise in libc.a(abort.o) refers to libpthread.a(pt-raise.o) ``` To facilitate deployment of --warn-backrefs, add --warn-backrefs-exclude= so that certain known issues (which may be impractical to fix) can be whitelisted. Deliberate choices: * Not a comma-separated list (`--warn-backrefs-exclude=liba.a,libb.a`). -Wl, splits the argument at commas, so we cannot use commas. --export-dynamic-symbol is similar. * Not in the style of `--warn-backrefs='*' --warn-backrefs=-liba.a`. We just need exclusion, not inclusion. For easier build system integration, we should avoid order dependency. With the current scheme, we enable --warn-backrefs, and indivial libraries can add --warn-backrefs-exclude=<glob> to their LDFLAGS. Reviewed By: psmith Differential Revision: https://reviews.llvm.org/D77512	2020-04-20 07:52:15 -07:00
Sriraman Tallam	94317878d8	LLD Support for Basic Block Sections This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf This patch adds lld support for basic block sections and performs relaxations after the basic blocks have been reordered. After the linker has reordered the basic block sections according to the desired sequence, it runs a relaxation pass to optimize jump instructions. Currently, the compiler emits the long form of all jump instructions. AMD64 ISA supports variants of jump instructions with one byte offset or a four byte offset. The compiler generates jump instructions with R_X86_64 32-bit PC relative relocations. We would like to use a new relocation type for these jump instructions as it makes it easy and accurate while relaxing these instructions. The relaxation pass does two things: First, it deletes all explicit fall-through direct jump instructions between adjacent basic blocks. This is done by discarding the tail of the basic block section. Second, If there are consecutive jump instructions, it checks if the first conditional jump can be inverted to convert the second into a fall through and delete the second. The jump instructions are relaxed by using jump instruction mods, something like relocations. These are used to modify the opcode of the jump instruction. Jump instruction mods contain three values, instruction offset, jump type and size. While writing this jump instruction out to the final binary, the linker uses the jump instruction mod to determine the opcode and the size of the modified jump instruction. These mods are required because the input object files are memory-mapped without write permissions and directly modifying the object files requires copying these sections. Copying a large number of basic block sections significantly bloats memory. Differential Revision: https://reviews.llvm.org/D68065	2020-04-07 06:55:57 -07:00
Alexandre Ganea	09158252f7	[ThinLTO] Allow usage of all hardware threads in the system Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled. One can now say in LLD: /opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified. /opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency(). /opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask. When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows). When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows). Differential Revision: https://reviews.llvm.org/D75153	2020-03-27 10:20:58 -04:00
Shoaib Meenai	2822852ffc	[ELF] Correct error message when OUTPUT_FORMAT is used Any OUTPUT_FORMAT in a linker script overrides the emulation passed on the command line, so record the passed bfdname and use that in the error message about incompatible input files. This prevents confusing error messages. For example, if you explicitly pass `-m elf_x86_64` to LLD but accidentally include a linker script which sets `OUTPUT_FORMAT(elf32-i386)`, LLD would previously complain about your input files being compatible with elf_x86_64, which isn't the actual issue, and is confusing because the input files are in fact x86-64 ELF files. Interestingly enough, this also prevents a segfault! When we don't pass `-m` and we have an object file which is incompatible with the `OUTPUT_FORMAT` set by a linker script, the object file is checked for compatibility before it's added to the objectFiles vector. config->emulation, objectFiles, and sharedFiles will all be empty, so we'll attempt to access bitcodeFiles[0], but bitcodeFiles is also empty, so we'll segfault. This commit prevents the segfault by adding OUTPUT_FORMAT as a possible source of machine configuration, and it also adds an llvm_unreachable to diagnose similar issues in the future. Differential Revision: https://reviews.llvm.org/D76109	2020-03-12 22:54:53 -07:00
David Bozier	6e2804ce6b	[LLD] Add support for --unique option Summary: Places orphan sections into a unique output section. This prevents the merging of orphan sections of the same name. Matches behaviour of GNU ld --unique. --unique=pattern is not implemented. Motivated user case shown in the test has 2 local symbols as they would appear if C++ source has been compiled with -ffunction-sections. The merging of these sections in the case of a partial link (-r) may limit the effectiveness of -gc-sections of a subsequent link. Reviewers: espindola, jhenderson, bd1976llvm, edd, andrewng, JonChesterfield, MaskRay, grimar, ruiu, psmith Reviewed By: MaskRay, grimar Subscribers: emaste, arichardson, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75536	2020-03-10 12:20:21 +00:00
Rafael Ávila de Espíndola	d48d339156	[lld][ELF] Add --shuffle-sections=seed to shuffle input sections Summary: This option causes lld to shuffle sections by assigning different priorities in each run. The use case for this is to introduce randomization in benchmarks. The idea is inspired by the paper "Producing Wrong Data Without Doing Anything Obviously Wrong!" (https://www.inf.usi.ch/faculty/hauswirth/publications/asplos09.pdf). Unlike the paper, we shuffle individual sections, not just input files. Doing this in lld is particularly convenient as the --reproduce option makes it easy to collect all the necessary bits for relinking the program being benchmarked. Once that it is done, all that is needed is to add --shuffle-sections=0 to the response file and relink before each run of the benchmark. Differential Revision: https://reviews.llvm.org/D74791	2020-02-19 13:44:12 -08:00
Fangrui Song	105a270028	[ELF][AArch64] Rename pacPlt to zPacPlt and forceBti to zForceIbt after D71327. NFC We use config->z* for -z options.	2020-02-13 21:02:54 -08:00
Russell Gallop	e7cb374433	[LLD][ELF] Add time-trace to ELF LLD This adds some of LLD specific scopes and picks up optimisation scopes via LTO/ThinLTO. Makes use of TimeProfiler multi-thread support added in `77e6bb3c`. Differential Revision: https://reviews.llvm.org/D71060	2020-02-06 12:14:13 +00:00
Teresa Johnson	2f63d549f1	Restore "[LTO/WPD] Enable aggressive WPD under LTO option" This restores `59733525d3` (D71913), along with bot fix `19c76989bb`. The bot failure should be fixed by D73418, committed as `af954e441a`. I also added a fix for non-x86 bot failures by requiring x86 in new test lld/test/ELF/lto/devirt_vcall_vis_public.ll.	2020-01-27 07:55:05 -08:00
Teresa Johnson	90e630a95e	Revert "[LTO/WPD] Enable aggressive WPD under LTO option" This reverts commit `59733525d3`. There is a windows sanitizer bot failure in one of the cfi tests that I will need some time to figure out: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/57155/steps/stage%201%20check/logs/stdio	2020-01-23 17:29:24 -08:00
Teresa Johnson	59733525d3	[LTO/WPD] Enable aggressive WPD under LTO option Summary: Third part in series to support Safe Whole Program Devirtualization Enablement, see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html This patch adds type test metadata under -fwhole-program-vtables, even for classes without hidden visibility. It then changes WPD to skip devirtualization for a virtual function call when any of the compatible vtables has public vcall visibility. Additionally, internal LLVM options as well as lld and gold-plugin options are added which enable upgrading all public vcall visibility to linkage unit (hidden) visibility during LTO. This enables the more aggressive WPD to kick in based on LTO time knowledge of the visibility guarantees. Support was added to all flavors of LTO WPD (regular, hybrid and index-only), and to both the new and old LTO APIs. Unfortunately it was not simple to split the first and second parts of this part of the change (the unconditional emission of type tests and the upgrading of the vcall visiblity) as I needed a way to upgrade the public visibility on legacy WPD llvm assembly tests that don't include linkage unit vcall visibility specifiers, to avoid a lot of test churn. I also added a mechanism to LowerTypeTests that allows dropping type test assume sequences we now aggressively insert when we invoke distributed ThinLTO backends with null indexes, which is used in testing mode, and which doesn't invoke the normal ThinLTO backend pipeline. Depends on D71907 and D71911. Reviewers: pcc, evgeny777, steven_wu, espindola Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71913	2020-01-23 16:09:44 -08:00
Fangrui Song	0fbf28f7aa	[ELF] --no-dynamic-linker: don't emit undefined weak symbols to .dynsym I felt really sad to push this commit for my selfish purpose to make glibc -static-pie build with lld. Some code constructs in glibc require R_X86_64_GOTPCREL/R_X86_64_REX_GOTPCRELX referencing undefined weak to be resolved to a GOT entry not relocated by R_X86_64_GLOB_DAT (GNU ld behavior), e.g. csu/libc-start.c if (__pthread_initialize_minimal != NULL) __pthread_initialize_minimal (); elf/dl-object.c void _dl_add_to_namespace_list (struct link_map new, Lmid_t nsid) { / We modify the list of loaded objects. */ __rtld_lock_lock_recursive (GL(dl_load_write_lock)); Emitting a GLOB_DAT will make the address equal &__ehdr_start (true value) and cause elf/ldconfig to segfault. glibc really should move away from weak references, which do not have defined semantics. Temporarily special case --no-dynamic-linker.	2020-01-23 12:25:15 -08:00
Fangrui Song	7cd429f27d	[ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement Technology This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang. It adds Intel CET (Control-flow Enforcement Technology) support to lld. The implementation follows the draft version of psABI which you can download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI. CET introduces a new restriction on indirect jump instructions so that you can limit the places to which you can jump to using indirect jumps. In order to use the feature, you need to compile source files with -fcf-protection=full. * IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt. * SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified. IBT-enabled executables/shared objects have two PLT sections, ".plt" and ".plt.sec". For the details as to why we have two sections, please read the comments. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D59780	2020-01-13 23:39:28 -08:00
Rui Ueyama	69da7e29de	Revert an accidental commit `af5ca40b47`	2019-12-13 15:17:40 +09:00
Rui Ueyama	af5ca40b47	temporary	2019-12-13 14:35:03 +09:00
Fangrui Song	f0558f582a	[ELF] Delete unused Configuration::zExecstack after D56554	2019-11-25 14:44:09 -08:00
Nick Terrell	6814232429	[LLD][ELF] Support --[no-]mmap-output-file with F_no_mmap Summary: Add a flag `F_no_mmap` to `FileOutputBuffer` to support `--[no-]mmap-output-file` in ELF LLD. LLD currently explicitly ignores this flag for compatibility with GNU ld and gold. We need this flag to speed up link time for large binaries in certain scenarios. When we link some of our larger binaries we find that LLD takes 50+ GB of memory, which causes memory pressure. The memory pressure causes the VM to flush dirty pages of the output file to disk. This is normally okay, since we should be flushing cold pages. However, when using BtrFS with compression we need to write 128KB at a time when we flush a page. If any page in that 128KB block is written again, then it must be flushed a second time, and so on. Since LLD doesn't write sequentially this causes write amplification. The same 128KB block will end up being flushed multiple times, causing the linker to many times more IO than necessary. We've observed 3-5x faster builds with -no-mmap-output-file when we hit this scenario. The bad scenario only applies to compressed filesystems, which group together multiple pages into a single compressed block. I've tested BtrFS, but the problem will be present for any compressed filesystem on Linux, since it is caused by the VM. Silently ignoring --no-mmap-output-file caused a silent regression when we switched from gold to lld. We pass --no-mmap-output-file to fix this edge case, but since lld silently ignored the flag we didn't realize it wasn't being respected. Benchmark building a 9 GB binary that exposes this edge case. I linked 3 times with --mmap-output-file and 3 times with --no-mmap-output-file and took the average. The machine has 24 cores @ 2.4 GHz, 112 GB of RAM, BtrFS mounted with -compress-force=zstd, and an 80% full disk. \| Mode \| Time \| \|---------\|-------\| \| mmap \| 894 s \| \| no mmap \| 126 s \| When compression is disabled, BtrFS performs just as well with and without mmap on this benchmark. I was unable to reproduce the regression with any binaries in lld-speed-test. Reviewed By: ruiu, MaskRay Differential Revision: https://reviews.llvm.org/D69294	2019-10-29 15:49:08 -07:00
Michał Górny	2a0fcae3d4	[lld] [ELF] Add '-z nognustack' opt to suppress emitting PT_GNU_STACK Add a new '-z nognustack' option that suppresses emitting PT_GNU_STACK segment. This segment is not supported at all on NetBSD (stack is always non-executable), and the option is meant to be used to disable emitting it. Differential Revision: https://reviews.llvm.org/D56554	2019-10-29 17:54:23 +01:00
Nico Weber	5976a3f5aa	Fix a few typos in lld/ELF to cycle bots	2019-10-28 21:41:47 -04:00
Fangrui Song	0264950697	[ELF] Add -z separate-loadable-segments to complement separate-code and noseparate-code D64906 allows PT_LOAD to have overlapping p_offset ranges. In the default R RX RW RW layout + -z noseparate-code case, we do not tail pad segments when transiting to another segment. This can save at most 3*maxPageSize bytes. a) Before D64906, we tail pad R, RX and the first RW. b) With -z separate-code, we tail pad R and RX, but not the first RW (RELRO). In some cases, b) saves one file page. In some cases, b) wastes one virtual memory page. The waste is a concern on Fuchsia. Because it uses compressed binaries, it doesn't benefit from the saved file page. This patch adds -z separate-loadable-segments to restore the behavior before D64906. It can affect section addresses and can thus be used as a debugging mechanism (see PR43214 and ld.so partition bug in crbug.com/998712). Reviewed By: jakehehrlich, ruiu Differential Revision: https://reviews.llvm.org/D67481 llvm-svn: 372807	2019-09-25 03:39:31 +00:00
Peter Smith	ea99ce5e9b	[ELF][ARM] Implement --fix-cortex-a8 to fix erratum 657417 The --fix-cortex-a8 option implements a linker workaround for the coretex-a8 erratum 657417. A summary of the erratum conditions is: - A 32-bit Thumb-2 branch instruction B.w, Bcc.w, BL, BLX spans two 4KiB regions. - The destination of the branch is to the first 4KiB region. - The instruction before the branch is a 32-bit Thumb-2 non-branch instruction. The linker fix is to redirect the branch to a patch not in the first 4KiB region. The patch forwards the branch on to its target. The cortex-a8, is an old CPU, with the first implementation of this workaround in ld.bfd appearing in 2009. The cortex-a8 has been used in early Android Phones and there are some critical applications that still need to run on a cortex-a8 that have the erratum. The patch is applied roughly 10 times on LLD and 20 on Clang when they are built with --fix-cortex-a8 on an Arm system. The formal erratum description is avaliable in the ARM Core Cortex-A8 (AT400/AT401) Errata Notice document. This is available from Arm on request but it seems to be findable via a web search. Differential Revision: https://reviews.llvm.org/D67284 llvm-svn: 371965	2019-09-16 09:38:38 +00:00
Fangrui Song	e28a70daf4	[ELF] Consistently prioritize non-* wildcards overs "" in version scripts We prioritize non- wildcards overs VER_NDX_LOCAL/VER_NDX_GLOBAL "". This patch generalizes the rule to "" of other versions and thus fixes PR40176. I don't feel strongly about this GNU linkers' behavior but the generalization simplifies code. Delete `config->defaultSymbolVersion` which was used to special case VER_NDX_LOCAL/VER_NDX_GLOBAL "*". In `SymbolTable::scanVersionScript`, custom versions are handled the same way as VER_NDX_LOCAL/VER_NDX_GLOBAL. So merge `config->versionScript{Locals,Globals}` into `config->versionDefinitions`. Overall this seems to simplify the code. In `SymbolTable::assign{Exact,Wildcard}Versions`, `sym->verdefIndex == config->defaultSymbolVersion` is changed to `verdefIndex == UINT32_C(-1)`. This allows us to give duplicate assignment diagnostics for `{ global: foo; };` `V1 { global: foo; };` In test/linkerscript/version-script.s: vs_index of an undefined symbol changes from 0 to 1. This doesn't matter (arguably 1 is better because the binding is STB_GLOBAL) because vs_index of an undefined symbol is ignored. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D65716 llvm-svn: 367869	2019-08-05 14:31:39 +00:00
Fangrui Song	5391f158c2	[ELF] Add -z separate-code and pad the last page of last PF_X PT_LOAD with traps only if -z separate-code is specified This patch 1) adds -z separate-code and -z noseparate-code (default). 2) changes the condition that the last page of last PF_X PT_LOAD is padded with trap instructions. Current condition (after D33630): if there is no `SECTIONS` commands. After this change: if -z separate-code is specified. -z separate-code was introduced to ld.bfd in 2018, to place the text segment in its own pages. There is no overlap in pages between an executable segment and a non-executable segment: 1) RX cannot load initial contents from R or RW(or non-SHF_ALLOC). 2) R and RW(or non-SHF_ALLOC) cannot load initial contents from RX. lld's current status: - Between R and RX: in `Writer<ELFT>::fixSectionAlignments()`, the start of a segment is always aligned to maxPageSize, so the initial contents loaded by R and RX do not overlap. I plan to allow overlaps in D64906 if -z noseparate-code is in effect. - Between RX and RW(or non-SHF_ALLOC if RW doesn't exist): we currently unconditionally pad the last page to commonPageSize (defaults to 4096 on all targets we support). This patch will make it effective only if -z separate-code is specified. -z separate-code is a dubious feature that intends to reduce the number of ROP gadgets (which is actually ineffective because attackers can find plenty of gadgets in the text segment, no need to find gadgets in non-code regions). With the overlapping PT_LOAD technique D64906, -z noseparate-code removes two more alignments at segment boundaries than -z separate-code. This saves at most defaultCommonPageSize2 bytes, which are significant on targets with large defaultCommonPageSize (AArch64/MIPS/PPC: 65536). Issues/feedback on alignment at segment boundaries to help understand the implication: binutils PR24490 (the situation on ld.bfd is worse because they have two R-- on both sides of R-E so more alignments.) * In binutils, the 2018-02-27 commit "ld: Add --enable-separate-code" made -z separate-code the default on Linux. `d969dea983` In musl-cross-make, binutils is configured with --disable-separate-code to address size regressions caused by -z separate-code. (lld actually has the same issue, which I plan to fix in a future patch. The ld.bfd x86 status is worse because they default to max-page-size=0x200000). * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237676 people want smaller code size. This patch will remove one alignment boundary. * Stef O'Rear: I'm opposed to any kind of page alignment at the text/rodata line (having a partial page of text aliased as rodata and vice versa has no demonstrable harm, and I actually care about small systems). So, make -z noseparate-code the default. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64903 llvm-svn: 367537	2019-08-01 09:58:25 +00:00
Fangrui Song	47cfe8f321	[ELF] Fix variable names in comments after VariableName -> variableName change Also fix some typos. llvm-svn: 366181	2019-07-16 05:50:45 +00:00
Rui Ueyama	3837f4273f	[Coding style change] Rename variables so that they start with a lowercase letter This patch is mechanically generated by clang-llvm-rename tool that I wrote using Clang Refactoring Engine just for creating this patch. You can see the source code of the tool at https://reviews.llvm.org/D64123. There's no manual post-processing; you can generate the same patch by re-running the tool against lld's code base. Here is the main discussion thread to change the LLVM coding style: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html In the discussion thread, I proposed we use lld as a testbed for variable naming scheme change, and this patch does that. I chose to rename variables so that they are in camelCase, just because that is a minimal change to make variables to start with a lowercase letter. Note to downstream patch maintainers: if you are maintaining a downstream lld repo, just rebasing ahead of this commit would cause massive merge conflicts because this patch essentially changes every line in the lld subdirectory. But there's a remedy. clang-llvm-rename tool is a batch tool, so you can rename variables in your downstream repo with the tool. Given that, here is how to rebase your repo to a commit after the mass renaming: 1. rebase to the commit just before the mass variable renaming, 2. apply the tool to your downstream repo to mass-rename variables locally, and 3. rebase again to the head. Most changes made by the tool should be identical for a downstream repo and for the head, so at the step 3, almost all changes should be merged and disappear. I'd expect that there would be some lines that you need to merge by hand, but that shouldn't be too many. Differential Revision: https://reviews.llvm.org/D64121 llvm-svn: 365595	2019-07-10 05:00:37 +00:00
Francis Visoiu Mistrih	34667519dc	[Remarks] Extend -fsave-optimization-record to specify the format Use -fsave-optimization-record=<format> to specify a different format than the default, which is YAML. For now, only YAML is supported. llvm-svn: 363573	2019-06-17 16:06:00 +00:00
Peter Collingbourne	0282898586	ELF: Create synthetic sections for loadable partitions. We create several types of synthetic sections for loadable partitions, including: - The dynamic symbol table. This allows code outside of the loadable partitions to find entry points with dlsym. - Creating a dynamic symbol table also requires the creation of several other synthetic sections for the partition, such as the dynamic table and hash table sections. - The partition's ELF header is represented as a synthetic section in the combined output file, and will be used by llvm-objcopy to extract partitions. Differential Revision: https://reviews.llvm.org/D62350 llvm-svn: 362819	2019-06-07 17:57:58 +00:00
Peter Smith	e208208a31	[ELF][AArch64] Support for BTI and PAC Branch Target Identification (BTI) and Pointer Authentication (PAC) are architecture features introduced in v8.5a and 8.3a respectively. The new instructions have been added in the hint space so that binaries take advantage of support where it exists yet still run on older hardware. The impact of each feature is: BTI: For executable pages that have been guarded, all indirect branches must have a destination that is a BTI instruction of the appropriate type. For the static linker, this means that PLT entries must have a "BTI c" as the first instruction in the sequence. BTI is an all or nothing property for a link unit, any indirect branch not landing on a valid destination will cause a Branch Target Exception. PAC: The dynamic loader encodes with PACIA the address of the destination that the PLT entry will load from the .plt.got, placing the result in a subset of the top-bits that are not valid virtual addresses. The PLT entry may authenticate these top-bits using the AUTIA instruction before branching to the destination. Use of PAC in PLT sequences is a contract between the dynamic loader and the static linker, it is independent of whether the relocatable objects use PAC. BTI and PAC are independent features that can be combined. So we can have several combinations of PLT: - Standard with no BTI or PAC - BTI PLT with "BTI c" as first instruction. - PAC PLT with "AUTIA1716" before the indirect branch to X17. - BTIPAC PLT with "BTI c" as first instruction and "AUTIA1716" before the first indirect branch to X17. The use of BTI and PAC in relocatable object files are encoded by feature bits in the .note.gnu.property section in a similar way to Intel CET. There is one AArch64 specific program property GNU_PROPERTY_AARCH64_FEATURE_1_AND and two target feature bits defined: - GNU_PROPERTY_AARCH64_FEATURE_1_BTI -- All executable sections are compatible with BTI. - GNU_PROPERTY_AARCH64_FEATURE_1_PAC -- All executable sections have return address signing enabled. Due to the properties of FEATURE_1_AND the static linker can tell when all input relocatable objects have the BTI and PAC feature bits set. The static linker uses this to enable the appropriate PLT sequence. Neither -> standard PLT GNU_PROPERTY_AARCH64_FEATURE_1_BTI -> BTI PLT GNU_PROPERTY_AARCH64_FEATURE_1_PAC -> PAC PLT Both properties -> BTIPAC PLT In addition to the .note.gnu.properties there are two new command line options: --force-bti : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object that does not. --pac-plt : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_PAC. As PAC is a contract between the loader and static linker no warning is given if it is not present in an input. Two processor specific dynamic tags are used to communicate that a non standard PLT sequence is being used. DTI_AARCH64_BTI_PLT and DTI_AARCH64_BTI_PAC. Differential Revision: https://reviews.llvm.org/D62609 llvm-svn: 362793	2019-06-07 13:00:17 +00:00
Rui Ueyama	2057f8366a	Read .note.gnu.property sections and emit a merged .note.gnu.property section. This patch also adds `--require-cet` option for the sake of testing. The actual feature for IBT-aware PLT is not included in this patch. This is a part of https://reviews.llvm.org/D59780. Submitting this first should make it easy to work with a related change (https://reviews.llvm.org/D62609). Differential Revision: https://reviews.llvm.org/D62853 llvm-svn: 362579	2019-06-05 03:04:46 +00:00
Ben Dunbobbin	1d16515fb4	[ELF] Implement Dependent Libraries Feature This patch implements a limited form of autolinking primarily designed to allow either the --dependent-library compiler option, or "comment lib" pragmas ( https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically add the specified library to the link when processing the input file generated by the compiler. Currently this extension is unique to LLVM and LLD. However, care has been taken to design this feature so that it could be supported by other ELF linkers. The design goals were to provide: - A simple linking model for developers to reason about. - The ability to to override autolinking from the linker command line. - Source code compatibility, where possible, with "comment lib" pragmas in other environments (MSVC in particular). Dependent library support is implemented differently for ELF platforms than on the other platforms. Primarily this difference is that on ELF we pass the dependent library specifiers directly to the linker without manipulating them. This is in contrast to other platforms where they are mapped to a specific linker option by the compiler. This difference is a result of the greater variety of ELF linkers and the fact that ELF linkers tend to handle libraries in a more complicated fashion than on other platforms. This forces us to defer handling the specifiers to the linker. In order to achieve a level of source code compatibility with other platforms we have restricted this feature to work with libraries that meet the following "reasonable" requirements: 1. There are no competing defined symbols in a given set of libraries, or if they exist, the program owner doesn't care which is linked to their program. 2. There may be circular dependencies between libraries. The binary representation is a mergeable string section (SHF_MERGE, SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES (0x6fff4c04). The compiler forms this section by concatenating the arguments of the "comment lib" pragmas and --dependent-library options in the order they are encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs sections with the normal mergeable string section rules. As an example, #pragma comment(lib, "foo") would result in: .section ".deplibs","MS",@llvm_dependent_libraries,1 .asciz "foo" For LTO, equivalent information to the contents of a the .deplibs section can be retrieved by the LLD for bitcode input files. LLD processes the dependent library specifiers in the following way: 1. Dependent libraries which are found from the specifiers in .deplibs sections of relocatable object files are added when the linker decides to include that file (which could itself be in a library) in the link. Dependent libraries behave as if they were appended to the command line after all other options. As a consequence the set of dependent libraries are searched last to resolve symbols. 2. It is an error if a file cannot be found for a given specifier. 3. Any command line options in effect at the end of the command line parsing apply to the dependent libraries, e.g. --whole-archive. 4. The linker tries to add a library or relocatable object file from each of the strings in a .deplibs section by; first, handling the string as if it was specified on the command line; second, by looking for the string in each of the library search paths in turn; third, by looking for a lib<string>.a or lib<string>.so (depending on the current mode of the linker) in each of the library search paths. 5. A new command line option --no-dependent-libraries tells LLD to ignore the dependent libraries. Rationale for the above points: 1. Adding the dependent libraries last makes the process simple to understand from a developers perspective. All linkers are able to implement this scheme. 2. Error-ing for libraries that are not found seems like better behavior than failing the link during symbol resolution. 3. It seems useful for the user to be able to apply command line options which will affect all of the dependent libraries. There is a potential problem of surprise for developers, who might not realize that these options would apply to these "invisible" input files; however, despite the potential for surprise, this is easy for developers to reason about and gives developers the control that they may require. 4. This algorithm takes into account all of the different ways that ELF linkers find input files. The different search methods are tried by the linker in most obvious to least obvious order. 5. I considered adding finer grained control over which dependent libraries were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this is not necessary: if finer control is required developers can fall back to using the command line directly. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html. Differential Revision: https://reviews.llvm.org/D60274 llvm-svn: 360984	2019-05-17 03:44:15 +00:00
Fangrui Song	e041d15f5e	[LLD][ELF] Add the -z ifunc-noplt option Patch by Mark Johnston! Summary: When the option is configured, ifunc calls do not go through the PLT; rather, they appear as regular function calls with relocations referencing the ifunc symbol, and the resolver is invoked when applying the relocation. This is intended for use in freestanding environments where text relocations are permissible and is incompatible with the -z text option. The option is motivated by ifunc usage in the FreeBSD kernel, where ifuncs are used to elide CPU feature flag bit checks in hot paths. Instead of replacing the cost of a branch with that of an indirect function call, the -z ifunc-noplt option is used to ensure that ifunc calls carry no hidden overhead relative to normal function calls. Test Plan: I added a couple of regression tests and tested the FreeBSD kernel build using the latest lld sources. To demonstrate the effects of the change, I used a micro-benchmark which results in frequent invocations of a FreeBSD kernel ifunc. The benchmark was run with and without IBRS enabled, and with and without -zifunc-noplt configured. The observed speedup is small and consistent, and is significantly larger with IBRS enabled: https://people.freebsd.org/~markj/ifunc-noplt/noibrs.txt https://people.freebsd.org/~markj/ifunc-noplt/ibrs.txt Reviewed By: ruiu, MaskRay Differential Revision: https://reviews.llvm.org/D61613 llvm-svn: 360685	2019-05-14 15:25:21 +00:00
Peter Smith	4e21c770ec	[ELF] Full support for -n (--nmagic) and -N (--omagic) via common page The -n (--nmagic) disables page alignment, and acts as a -Bstatic The -N (--omagic) does what -n does but also marks the executable segment as writeable. As page alignment is disabled headers are not allocated unless explicit in the linker script. To disable page alignment in LLD we choose to set the page sizes to 1 so that any alignment based on the page size does nothing. To set the Target->PageSize to 1 we implement -z common-page-size, which has the side effect of allowing the user to set the value as well. Setting the page alignments to 1 does mean that any use of CONSTANT(MAXPAGESIZE) or CONSTANT(COMMONPAGESIZE) in a linker script will return 1, unlike in ld.bfd. However given that -n and -N disable paging these probably shouldn't be used in a linker script where -n or -N is in use. Differential Revision: https://reviews.llvm.org/D61688 llvm-svn: 360593	2019-05-13 16:01:26 +00:00
Rui Ueyama	432030e843	[ELF] Dump symbols ordered by profiled guided section layout to file. Patch by Tiancong Wang. In D36351, Call-Chain Clustering (C3) heuristic is implemented with option --call-graph-ordering-file <file>. This patch adds a flag --print-symbol-order=<file> to LLD, and when specified, it prints out the symbols ordered by the heuristics to the file. The symbols printout is helpful to those who want to understand the heuristics and want to reproduce the ordering with --symbol-ordering-file in later pass. Differential Revision: https://reviews.llvm.org/D59311 llvm-svn: 357133	2019-03-27 23:52:22 +00:00
Francis Visoiu Mistrih	dd42236c6c	Reland "[Remarks] Add -foptimization-record-passes to filter remark emission" Currently we have -Rpass for filtering the remarks that are displayed as diagnostics, but when using -fsave-optimization-record, there is no way to filter the remarks while generating them. This adds support for filtering remarks by passes using a regex. Ex: `clang -fsave-optimization-record -foptimization-record-passes=inline` will only emit the remarks coming from the pass `inline`. This adds: * `-fsave-optimization-record` to the driver * `-opt-record-passes` to cc1 * `-lto-pass-remarks-filter` to the LTOCodeGenerator * `--opt-remarks-passes` to lld * `-pass-remarks-filter` to llc, opt, llvm-lto, llvm-lto2 * `-opt-remarks-passes` to gold-plugin Differential Revision: https://reviews.llvm.org/D59268 Original llvm-svn: 355964 llvm-svn: 355984	2019-03-12 21:22:27 +00:00
Francis Visoiu Mistrih	1d6c47ad2b	Revert "[Remarks] Add -foptimization-record-passes to filter remark emission" This reverts commit `20fff32b7d`. llvm-svn: 355976	2019-03-12 20:54:18 +00:00
Francis Visoiu Mistrih	20fff32b7d	[Remarks] Add -foptimization-record-passes to filter remark emission Currently we have -Rpass for filtering the remarks that are displayed as diagnostics, but when using -fsave-optimization-record, there is no way to filter the remarks while generating them. This adds support for filtering remarks by passes using a regex. Ex: `clang -fsave-optimization-record -foptimization-record-passes=inline` will only emit the remarks coming from the pass `inline`. This adds: * `-fsave-optimization-record` to the driver * `-opt-record-passes` to cc1 * `-lto-pass-remarks-filter` to the LTOCodeGenerator * `--opt-remarks-passes` to lld * `-pass-remarks-filter` to llc, opt, llvm-lto, llvm-lto2 * `-opt-remarks-passes` to gold-plugin Differential Revision: https://reviews.llvm.org/D59268 llvm-svn: 355964	2019-03-12 20:28:50 +00:00
Rong Xu	f92e59cbba	[PGO] Add options for context-sensitive PGO Add lld options for CSPGO (context-sensitive PGO). Differential Revision: https://reviews.llvm.org/D56675 llvm-svn: 355876	2019-03-11 22:51:38 +00:00
Ed Maste	e9932c103b	Correct "varaible" typo in comment llvm-svn: 353340	2019-02-06 20:36:02 +00:00
Rui Ueyama	7c77044a38	Add comment. llvm-svn: 353323	2019-02-06 18:53:17 +00:00
George Rimar	ae54e58b90	Recommit r353293 "[LLD][ELF] - Set DF_STATIC_TLS flag for i386 target." With the following changes: 1) Compilation fix: std::atomic<bool> HasStaticTlsModel = false; -> std::atomic<bool> HasStaticTlsModel{false}; 2) Adjusted the comment in code. Initial commit message: DF_STATIC_TLS flag indicates that the shared object or executable contains code using a static thread-local storage scheme. Patch checks if IE/LE relocations were used to check if the code uses a static model. If so it sets the DF_STATIC_TLS flag. Differential revision: https://reviews.llvm.org/D57749 ---- Modified : /lld/trunk/ELF/Arch/X86.cpp Modified : /lld/trunk/ELF/Config.h Modified : /lld/trunk/ELF/SyntheticSections.cpp Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model1.s Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model2.s Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model3.s Added : /lld/trunk/test/ELF/Inputs/i386-static-tls-model4.s Added : /lld/trunk/test/ELF/i386-static-tls-model.s Modified : /lld/trunk/test/ELF/i386-tls-ie-shared.s Modified : /lld/trunk/test/ELF/tls-dynamic-i686.s Modified : /lld/trunk/test/ELF/tls-opt-iele-i686-nopic.s llvm-svn: 353299	2019-02-06 14:43:30 +00:00
George Rimar	52fafcb919	Revert r353293 "[LLD][ELF] - Set DF_STATIC_TLS flag for i386 target." It broke BB: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/43450 http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/27891 Error is: tools/lld/ELF/Config.h:84:41: error: copying member subobject of type 'std::atomic<bool>' invokes deleted constructor std::atomic<bool> HasStaticTlsModel = false; llvm-svn: 353297	2019-02-06 13:53:32 +00:00
George Rimar	da60ad220b	[LLD][ELF] - Set DF_STATIC_TLS flag for i386 target. DF_STATIC_TLS flag indicates that the shared object or executable contains code using a static thread-local storage scheme. Patch checks if IE/LE relocations were used to check if the code uses a static model. If so it sets the DF_STATIC_TLS flag. Differential revision: https://reviews.llvm.org/D57749 llvm-svn: 353293	2019-02-06 13:38:10 +00:00
Fangrui Song	b4744d306c	[ELF] Support --{,no-}allow-shlib-undefined Summary: In ld.bfd/gold, --no-allow-shlib-undefined is the default when linking an executable. This patch implements a check to error on undefined symbols in a shared object, if all of its DT_NEEDED entries are seen. Our approach resembles the one used in gold, achieves a good balance to be useful but not too smart (ld.bfd traces all DSOs and emulates the behavior of a dynamic linker to catch more cases). The error is issued based on the symbol table, different from undefined reference errors issued for relocations. It is most effective when there are DSOs that were not linked with -z defs (e.g. when static sanitizers runtime is used). gold has a comment that some system libraries on GNU/Linux may have spurious undefined references and thus system libraries should be excluded (https://sourceware.org/bugzilla/show_bug.cgi?id=6811). The story may have changed now but we make --allow-shlib-undefined the default for now. Its interaction with -shared can be discussed in the future. Reviewers: ruiu, grimar, pcc, espindola Reviewed By: ruiu Subscribers: joerg, emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D57385 llvm-svn: 352826	2019-02-01 02:25:05 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Peter Smith	13d134684f	[ELF] Implement option to force PIC compatible Thunks By default LLD will generate position independent Thunks when the --pie or --shared option is used. Reference to absolute addresses is permitted in other cases. For some embedded systems position independent thunks are needed for code that executes before the MMU has been set up. The option --pic-veneer is used by ld.bfd to force position independent thunks. The patch adds --pic-veneer as the option is needed for the Linux kernel on Arm. fixes pr39886 Differential Revision: https://reviews.llvm.org/D55505 llvm-svn: 351326	2019-01-16 12:09:13 +00:00
Rui Ueyama	9f49990976	Add --plugin-opt=emit-llvm option. `--plugin-opt=emit-llvm` is an option for LTO. It makes the linker to combine all bitcode files and write the result to an output file without doing codegen. Gold LTO plugin has this option. This option is being used for some post-link code analysis tools that have to see a whole program but don't need to see them in the native machine code. Differential Revision: https://reviews.llvm.org/D55717 llvm-svn: 349198	2018-12-14 21:58:49 +00:00
George Rimar	a1b3ddbfec	[ELF] - Implement -z nodefaultlib This is https://bugs.llvm.org//show_bug.cgi?id=38978 Spec says that: "Objects may be built with the -z nodefaultlib option to suppress any search of the default locations at runtime. Use of this option implies that all the dependencies of an object can be located using its runpaths. Without this option, which is the most common case, no matter how you augment the runtime linker's library search path, its last element is always /usr/lib for 32-bit objects and /usr/lib/64 for 64-bit objects." The patch implements this option. Differential revision: https://reviews.llvm.org/D54577 llvm-svn: 347647	2018-11-27 09:48:17 +00:00
Fangrui Song	cc18f8aa0f	[ELF] Add --{,no-}call-graph-profile-sort (enabled by default) Summary: Add an option to disable sorting sections with call graph profile Reviewers: ruiu, Bigcheese, espindola Reviewed By: Bigcheese Subscribers: grimar, emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D53683 llvm-svn: 345332	2018-10-25 23:15:23 +00:00
Sean Fertile	4b5ec7fb80	Reland "[PPC64] Add split - stack support." Recommitting https://reviews.llvm.org/rL344544 after fixing undefined behavior from left-shifting a negative value. Original commit message: This support is slightly different then the X86_64 implementation in that calls to __morestack don't need to get rewritten to calls to __moresatck_non_split when a split-stack caller calls a non-split-stack callee. Instead the size of the stack frame requested by the caller is adjusted prior to the call to __morestack. The size the stack-frame will be adjusted by is tune-able through a new --split-stack-adjust-size option. llvm-svn: 344622	2018-10-16 17:13:01 +00:00
Sean Fertile	831a1336ff	Revert "[PPC64] Add split - stack support." This reverts commit https://reviews.llvm.org/rL344544, which causes failures on a undefined behaviour sanitizer bot --> lld/ELF/Arch/PPC64.cpp:849:35: runtime error: left shift of negative value -1 llvm-svn: 344551	2018-10-15 20:20:28 +00:00
Sean Fertile	795cc9332b	[PPC64] Add split - stack support. This support is slightly different then the X86_64 implementation in that calls to __morestack don't need to get rewritten to calls to __moresatck_non_split when a split-stack caller calls a non-split-stack callee. Instead the size of the stack frame requested by the caller is adjusted prior to the call to __morestack. The size the stack-frame will be adjusted by is tune-able through a new --split-stack-adjust-size option. Differential Revision: https://reviews.llvm.org/D52099 llvm-svn: 344544	2018-10-15 19:05:57 +00:00
Ali Tamur	63830b2794	Introduce a flag to warn when ifunc symbols are used with text relocations. Summary: This patch adds a new flag, --warn-ifunc-textrel, to work around a glibc bug. When a code with ifunc symbols is used to produce an object file with text relocations, lld always succeeds. However, if that object file is linked using an old version of glibc, the resultant binary just crashes with segmentation fault when it is run (The bug is going to be corrected as of glibc 2.19). Since there is no way to tell beforehand what library the object file will be linked against in the future, there does not seem to be a fool-proof way for lld to give an error only in cases where the binary will crash. So, with this change (dated 2018-09-25), lld starts to give a warning, contingent on a new command line flag that does not have a gnu counter part. The default value for --warn-ifunc-textrel is false, so lld behaviour will not change unless the user explicitly asks lld to give a warning. Users that link with a glibc library with version 2.19 or newer, or does not use ifunc symbols, or does not generate object files with text relocations do not need to take any action. Other users may consider to start passing warn-ifunc-textrel to lld to get early warnings. Reviewers: ruiu, espindola Reviewed By: ruiu Subscribers: grimar, MaskRay, markj, emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D52430 llvm-svn: 343628	2018-10-02 20:30:22 +00:00
Sean Fertile	7f3f05e0b7	[PPC64] Optimize redundant instructions in global access sequences. The access sequence for global variables in the medium and large code models use 2 instructions to add an offset to the toc-pointer. If the offset fits whithin 16-bits then the instruction that sets the high 16 bits is redundant. This patch adds the --toc-optimize option, (on by default) and enables rewriting of 2 instruction global variable accesses into 1 when the offset from the TOC-pointer to the variable (or .got entry) fits in 16 signed bits. eg addis %r3, %r2, 0 --> nop addi %r3, %r3, -0x8000 --> addi %r3, %r2, -0x8000 This rewriting can be disabled with the --no-toc-optimize flag Differential Revision: https://reviews.llvm.org/D49237 llvm-svn: 342602	2018-09-20 00:26:44 +00:00
Ed Maste	c0b474f67a	lld: add -z interpose support -z interpose sets the DF_1_INTERPOSE flag, marking the object as an interposer. Via FreeBSD PR 230604, linking Valgrind with lld failed. Differential Revision: https://reviews.llvm.org/D52094 llvm-svn: 342239	2018-09-14 14:25:37 +00:00
George Rimar	27bbe7d0b4	[LLF][ELF] - Support -z global. -z global is a flag used on Android (see D49198). Differential revision: https://reviews.llvm.org/D49374 llvm-svn: 340802	2018-08-28 08:24:34 +00:00
George Rimar	152e3c98ac	[LLD][ELF] - Remove UnresolvedPolicy::IgnoreAll and relative code. NFC. The code involved was simply dead. `IgnoreAll` value is used in `maybeReportUndefined` only which is never called for -r. And at the same time `IgnoreAll` was set only for -r. llvm-svn: 339672	2018-08-14 11:55:31 +00:00
Rui Ueyama	e262bb1afb	Add TARGET(foo) linker script directive. GNU ld's manual says that TARGET(foo) is basically an alias for `--format foo` where foo is a BFD target name such as elf64-x86-64. Unlike GNU linkers, lld doesn't allow arbitrary BFD target name for --format. We accept only "default", "elf" or "binary". This makes situation a bit tricky because we can't simply make TARGET an alias for --target. A quick code search revealed that the usage number of TARGET is very small, and the only meaningful usage is to switch to the binary mode. Thus, in this patch, we handle only TARGET(elf.*) and TARGET(binary). Differential Revision: https://reviews.llvm.org/D48153 llvm-svn: 339060	2018-08-06 21:29:41 +00:00
Peter Smith	70997f9a4e	[ELF][ARM] Implement support for Tag_ABI_VFP_args The Tag_ABI_VFP_args build attribute controls the procedure call standard used for floating point parameters on ARM. The values are: 0 - Base AAPCS (FP Parameters passed in Core (Integer) registers 1 - VFP AAPCS (FP Parameters passed in FP registers) 2 - Toolchain specific (Neither Base or VFP) 3 - Compatible with all (No use of floating point parameters) If the Tag_ABI_VFP_args build attribute is missing it has an implicit value of 0. We use the attribute in two ways: - Detect a clash in calling convention between Base, VFP and Toolchain. we follow ld.bfd's lead and do not error if there is a clash between an implicit Base AAPCS caused by a missing attribute. Many projects including the hard-float (VFP AAPCS) version of glibc contain assembler files that do not use floating point but do not have Tag_ABI_VFP_args. - Set the EF_ARM_ABI_FLOAT_SOFT or EF_ARM_ABI_FLOAT_HARD ELF header flag for Base or VFP AAPCS respectively. This flag is used by some ELF loaders. References: - Addenda to, and Errata in, the ABI for the ARM Architecture for Tag_ABI_VFP_args - Elf for the ARM Architecture for ELF header flags Fixes PR36009 Differential Revision: https://reviews.llvm.org/D49993 llvm-svn: 338377	2018-07-31 13:41:59 +00:00
David Bolvansky	a932cd409b	[AArch64] Support execute-only LOAD segments. Summary: This adds an LLD flag to mark executable LOAD segments execute-only for AArch64 targets. In AArch64 the expectation is that code is execute-only compatible, so this just adds a linker option to enforce this. Patch by: ivanlozano (Ivan Lozano) Reviewers: srhines, echristo, peter.smith, eugenis, javed.absar, espindola, ruiu Reviewed By: ruiu Subscribers: dokyungs, emaste, arichardson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49456 llvm-svn: 338271	2018-07-30 17:02:46 +00:00
Peter Collingbourne	a327a4c34e	ELF: Implement --icf=safe using address-significance tables. Differential Revision: https://reviews.llvm.org/D48146 llvm-svn: 337429	2018-07-18 22:49:31 +00:00
Yunlian Jiang	496fb3e7a0	Support option -plugin-opt=dwo_dir= Summary: This adds support to option -plugin-opt=dwo_dir=${DIR}. This option is used to specify the directory to store the .dwo files when LTO and debug fission is used at the same time. Reviewers: ruiu, espindola, pcc Reviewed By: pcc Subscribers: eraman, dexonsmith, mehdi_amini, emaste, arichardson, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D47904 llvm-svn: 337195	2018-07-16 17:55:48 +00:00
Rui Ueyama	45192b3746	Factor out code to parse -pack-dyn-relocs. NFC. llvm-svn: 336599	2018-07-09 20:22:28 +00:00
Rui Ueyama	11479daf2f	lld: add experimental support for SHT_RELR sections. Patch by Rahul Chaudhry! This change adds experimental support for SHT_RELR sections, proposed here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg Pass '--pack-dyn-relocs=relr' to enable generation of SHT_RELR section and DT_RELR, DT_RELRSZ, and DT_RELRENT dynamic tags. Definitions for the new ELF section type and dynamic array tags, as well as the encoding used in the new section are all under discussion and are subject to change. Use with caution! Pass '--use-android-relr-tags' with '--pack-dyn-relocs=relr' to use SHT_ANDROID_RELR section type instead of SHT_RELR, as well as DT_ANDROID_RELR* dynamic tags instead of DT_RELR*. The generated section contents are identical. '--pack-dyn-relocs=android+relr --use-android-relr-tags' enables both '--pack-dyn-relocs=android' and '--pack-dyn-relocs=relr': lld will encode the relative relocations in a SHT_ANDROID_RELR section, and pack the rest of the dynamic relocations in a SHT_ANDROID_REL(A) section. Differential Revision: https://reviews.llvm.org/D48247 llvm-svn: 336594	2018-07-09 20:08:55 +00:00
Fangrui Song	bd3684f25b	[ELF] Support -z initfirst Summary: glibc uses this option to link libpthread.so glibc/nptl/Makefile: LDFLAGS-pthread.so = -Wl,--enable-new-dtags,-z,nodelete,-z,initfirst Reviewers: ruiu, echristo, espindola Subscribers: emaste, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D48329 llvm-svn: 335090	2018-06-20 02:06:01 +00:00
Simon Atanasyan	ed9ee69ccf	[ELF][MIPS] Multi-GOT implementation Almost all entries inside MIPS GOT are referenced by signed 16-bit index. Zero entry lies approximately in the middle of the GOT. So the total number of GOT entries cannot exceed ~16384 for 32-bit architecture and ~8192 for 64-bit architecture. This limitation makes impossible to link rather large application like for example LLVM+Clang. There are two workaround for this problem. The first one is using the -mxgot compiler's flag. It enables using a 32-bit index to access GOT entries. But each access requires two assembly instructions two load GOT entry index to a register. Another workaround is multi-GOT. This patch implements it. Here is a brief description of multi-GOT for detailed one see the following link https://dmz-portal.mips.com/wiki/MIPS_Multi_GOT. If the sum of local, global and tls entries is less than 64K only single got is enough. Otherwise, multi-got is created. Series of primary and multiple secondary GOTs have the following layout: ``` - Primary GOT Header Local entries Global entries Relocation only entries TLS entries - Secondary GOT Local entries Global entries TLS entries ... ``` All GOT entries required by relocations from a single input file entirely belong to either primary or one of secondary GOTs. To reference GOT entries each GOT has its own _gp value points to the "middle" of the GOT. In the code this value loaded to the register which is used for GOT access. MIPS 32 function's prologue: ``` lui v0,0x0 0: R_MIPS_HI16 _gp_disp addiu v0,v0,0 4: R_MIPS_LO16 _gp_disp ``` MIPS 64 function's prologue: ``` lui at,0x0 14: R_MIPS_GPREL16 main ``` Dynamic linker does not know anything about secondary GOTs and cannot use a regular MIPS mechanism for GOT entries initialization. So we have to use an approach accepted by other architectures and create dynamic relocations R_MIPS_REL32 to initialize global entries (and local in case of PIC code) in secondary GOTs. But ironically MIPS dynamic linker requires GOT entries and correspondingly ordered dynamic symbol table entries to deal with dynamic relocations. To handle this problem relocation-only section in the primary GOT contains entries for all symbols referenced in global parts of secondary GOTs. Although the sum of local and normal global entries of the primary got should be less than 64K, the size of the primary got (including relocation-only entries can be greater than 64K, because parts of the primary got that overflow the 64K limit are used only by the dynamic linker at dynamic link-time and not by 16-bit gp-relative addressing at run-time. The patch affects common LLD code in the following places: - Added new hidden -mips-got-size flag. This flag required to set low maximum size of a single GOT to be able to test the implementation using small test cases. - Added InputFile argument to the getRelocTargetVA function. The same symbol referenced by GOT relocation from different input file might be allocated in different GOT. So result of relocation depends on the file. - Added new ctor to the DynamicReloc class. This constructor records settings of dynamic relocation which used to adjust address of 64kb page lies inside a specific output section. With the patch LLD is able to link all LLVM+Clang+LLD applications and libraries for MIPS 32/64 targets. Differential revision: https://reviews.llvm.org/D31528 llvm-svn: 334390	2018-06-11 07:24:31 +00:00
Rumeet Dhindsa	d2eb089a0e	Add support for ThinLTO plugin option thinlto-object-suffix-replace Differential Revision: https://reviews.llvm.org/D46608 llvm-svn: 332527	2018-05-16 21:04:08 +00:00
Sriraman Tallam	be01d2e3de	New option -z keep-text-section-prefix to keep text sections with prefixes separate. Separate output sections for selected text section prefixes to enable TLB optimizations and for readablilty. Differential Revision: https://reviews.llvm.org/D45841 llvm-svn: 331823	2018-05-08 23:19:50 +00:00
Rumeet Dhindsa	b5b7d6e19c	Add support for LTO plugin option obj-path Differential Revision: https://reviews.llvm.org/D46598 llvm-svn: 331817	2018-05-08 22:37:57 +00:00
Rui Ueyama	d432f216d1	Rename Config::ThinLTOIndexOnlyObjectFiles -> Config::ThinLTOIndexOnlyArg. llvm-svn: 331699	2018-05-07 23:24:16 +00:00
Rumeet Dhindsa	4fb5119215	Add support for thinlto option ( thinlto-emit-imports-files) to emit import files for thinlink. Differential Revision: https://reviews.llvm.org/D46400 llvm-svn: 331696	2018-05-07 23:14:12 +00:00
Rui Ueyama	4454b3d1eb	Parse --thinlto-prefix-replace early so that we don't need to parse it later. NFC. llvm-svn: 331657	2018-05-07 17:59:43 +00:00
Rumeet Dhindsa	d366e36bbf	Added support for ThinLTO plugin options : thinlto-index-only and thinlto-prefix-replace Differential Revision: https://reviews.llvm.org/D46034 llvm-svn: 331405	2018-05-02 21:40:07 +00:00
Rui Ueyama	88fe5c9557	Add -z {combreloc,copyreloc,noexecstack,lazy,relro,text}. Differential Revision: https://reviews.llvm.org/D45902 llvm-svn: 330482	2018-04-20 21:24:08 +00:00
Michael J. Spencer	b842725c1d	[ELF] Add profile guided section layout This adds profile guided layout using the Call-Chain Clustering (C³) heuristic from https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf . RFC: [llvm-dev] [RFC] Profile guided section layout http://lists.llvm.org/pipermail/llvm-dev/2017-June/114178.html Pass `--call-graph-ordering-file <file>` to read a call graph profile where each line has the format: <from symbol> <to symbol> <call count> Differential Revision: https://reviews.llvm.org/D36351 llvm-svn: 330234	2018-04-17 23:30:05 +00:00
Rui Ueyama	1d92aa7380	Add --warn-backrefs to maintain compatibility with other linkers I'm proposing a new command line flag, --warn-backrefs in this patch. The flag and the feature proposed below don't exist in GNU linkers nor the current lld. --warn-backrefs is an option to detect reverse or cyclic dependencies between static archives, and it can be used to keep your program compatible with GNU linkers after you switch to lld. I'll explain the feature and why you may find it useful below. lld's symbol resolution semantics is more relaxed than traditional Unix linkers. Therefore, ld.lld foo.a bar.o succeeds even if bar.o contains an undefined symbol that have to be resolved by some object file in foo.a. Traditional Unix linkers don't allow this kind of backward reference, as they visit each file only once from left to right in the command line while resolving all undefined symbol at the moment of visiting. In the above case, since there's no undefined symbol when a linker visits foo.a, no files are pulled out from foo.a, and because the linker forgets about foo.a after visiting, it can't resolve undefined symbols that could have been resolved otherwise. That lld accepts more relaxed form means (besides it makes more sense) that you can accidentally write a command line or a build file that works only with lld, even if you have a plan to distribute it to wider users who may be using GNU linkers. With --check-library-dependency, you can detect a library order that doesn't work with other Unix linkers. The option is also useful to detect cyclic dependencies between static archives. Again, lld accepts ld.lld foo.a bar.a even if foo.a and bar.a depend on each other. With --warn-backrefs it is handled as an error. Here is how the option works. We assign a group ID to each file. A file with a smaller group ID can pull out object files from an archive file with an equal or greater group ID. Otherwise, it is a reverse dependency and an error. A file outside --{start,end}-group gets a fresh ID when instantiated. All files within the same --{start,end}-group get the same group ID. E.g. ld.lld A B --start-group C D --end-group E A and B form group 0, C, D and their member object files form group 1, and E forms group 2. I think that you can see how this group assignment rule simulates the traditional linker's semantics. Differential Revision: https://reviews.llvm.org/D45195 llvm-svn: 329636	2018-04-09 23:05:48 +00:00
Rumeet Dhindsa	682a417e51	Added support for LTO options: sample_profile, new_pass_manager and debug_pass_manager Differential Revision: https://reviews.llvm.org/D45275 llvm-svn: 329598	2018-04-09 17:56:07 +00:00
Rui Ueyama	db46a62e2b	Implement --cref. This is an option to print out a table of symbols and filenames. The output format of this option is the same as GNU, so that it can be processed by the same scripts as before after migrating from GNU to lld. This option is mildly useful; we can live without it. But it is pretty convenient sometimes, and it can be implemented in 50 lines of code, so I think lld should support this option. Differential Revision: https://reviews.llvm.org/D44336 llvm-svn: 327565	2018-03-14 20:29:45 +00:00
Simon Dardis	cd8758233e	[mips][lld] Spectre variant two mitigation for MIPSR2 This patch provides migitation for CVE-2017-5715, Spectre variant two, which affects the P5600 and P6600. It implements the LLD part of -z hazardplt. Like the Clang part of this patch, I have opted for that specific option name in case alternative migitation methods are required in the future. The mitigation strategy suggested by MIPS for these processors is to use hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard barrier variants of the 'jalr' and 'jr' instructions respectively. These instructions impede the execution of instruction stream until architecturally defined hazards (changes to the instruction stream, privileged registers which may affect execution) are cleared. These instructions in MIPS' designs are not speculated past. These instructions are defined by the MIPS32R2 ISA, so this mitigation method is not compatible with processors which implement an earlier revision of the MIPS ISA. For LLD, this changes PLT stubs to use 'jalr.hb' and 'jr.hb'. Reviewers: atanasyan, ruiu Differential Revision: https://reviews.llvm.org/D43488 llvm-svn: 325647	2018-02-20 23:49:17 +00:00
Sam Clegg	3141ddc58d	Consistent (non) use of empty lines in include blocks The profailing style in lld seem to be to not include such empty lines. Clang-tidy/clang-format seem to handle this just fine. Differential Revision: https://reviews.llvm.org/D43528 llvm-svn: 325629	2018-02-20 21:53:18 +00:00
Rui Ueyama	7c9ad29304	Remove "--full-shutdown" and instead use an environment variable LLD_IN_TEST. We are running lld tests with "--full-shutdown" option because we don't want to call _exit() in lld if it is running tests. Regular shutdown is needed for leak sanitizer. This patch changes the way how we tell lld that it is running tests. Now "--full-shutdown" is removed, and LLD_IN_TEST environment variable is used instead. This patch enables full shutdown on all ports, e.g. ELF, COFF and wasm. Previously, we enabled it only for ELF. Differential Revision: https://reviews.llvm.org/D43410 llvm-svn: 325413	2018-02-16 23:41:48 +00:00
James Henderson	de300e66bb	[ELF] Add warnings for various symbols that cannot be ordered There are a number of different situations when symbols are requested to be ordered in the --symbol-ordering-file that cannot be ordered for some reason. To assist with identifying these symbols, and either tidying up the order file, or the inputs, a number of warnings have been added. As some users may find these warnings unhelpful, due to how they use the symbol ordering file, a switch has also been added to disable these warnings. The cases where we now warn are: * Entries in the order file that don't correspond to any symbol in the input * Undefined symbols * Absolute symbols * Symbols imported from shared objects * Symbols that are discarded, due to e.g. --gc-sections or /DISCARD/ linker script sections * Multiple of the same entry in the order file Reviewed by: rafael, ruiu Differential Revision: https://reviews.llvm.org/D42475 llvm-svn: 325125	2018-02-14 13:36:22 +00:00
Rui Ueyama	d42b1c0534	Remove Config->Verbose because we have errorHandler().Verbose. llvm-svn: 324684	2018-02-08 23:52:09 +00:00
Rafael Espindola	64626b344b	Store just argv[0] in Config. Having the full argv there seems in conflict with the desire to parse all command line options in the Driver. llvm-svn: 324418	2018-02-06 22:37:05 +00:00
Rafael Espindola	7a7a81d9d1	Replace ApplyDynamicRelocs with WriteAddends. The difference is that WriteAddends also takes IsRela into consideration. llvm-svn: 324271	2018-02-05 20:55:46 +00:00
Peter Smith	64f65b02d2	[ELF] Implement --[no-]apply-dynamic-relocs option. When resolving dynamic RELA relocations the addend is taken from the relocation and not the place being relocated. Accordingly lld does not write the addend field to the place like it would for a REL relocation. Unfortunately there is some system software, in particlar dynamic loaders such as Bionic's linker64 that use the value of the place prior to relocation to find the offset that they have been loaded at. Both gold and bfd control this behavior with the --[no-]apply-dynamic-relocs option. This change implements the option and defaults it to true for compatibility with gold and bfd. Differential Revision: https://reviews.llvm.org/D42797 llvm-svn: 324221	2018-02-05 10:15:08 +00:00
Rui Ueyama	6a8e79b8e5	Add -{no,}-check-sections flags to enable/disable section overlchecking GNU linkers have this option. Differential Revision: https://reviews.llvm.org/D42858 llvm-svn: 324150	2018-02-02 22:24:06 +00:00
Rui Ueyama	aad2e328b9	Add --no-gnu-unique and --no-undefined-version for completeness. Differential Revision: https://reviews.llvm.org/D42865 llvm-svn: 324145	2018-02-02 21:44:06 +00:00
James Henderson	9c6e2fd5a4	[ELF] Add --print-icf-sections flag Currently ICF information is output through stderr if the "--verbose" flag is used. This differs to Gold for example, which uses an explicit flag to output this to stdout. This commit adds the "--print-icf-sections" and "--no-print-icf-sections" flags and changes the output message format for clarity and consistency with "--print-gc-sections". These messages are still output to stderr if using the verbose flag. However to avoid intermingled message output to console, this will not occur when the "--print-icf-sections" flag is used. Existing tests have been modified to expect the new message format from stderr. Patch by Owen Reynolds. Differential Revision: https://reviews.llvm.org/D42375 Reviewers: ruiu, rafael Reviewed by: llvm-svn: 323976	2018-02-01 16:00:46 +00:00
Alexander Richardson	6b367faa45	[ELF] Make overlapping output sections an error Summary: While trying to make a linker script behave the same way with lld as it did with bfd, I discovered that lld currently doesn't diagnose overlapping output sections. I was getting very strange runtime failures which I tracked down to overlapping sections in the resulting binary. When linking with ld.bfd overlapping output sections are an error unless --noinhibit-exec is passed and I believe lld should behave the same way here to avoid surprising crashes at runtime. The patch also uncovered an errors in the tests: arm-thumb-interwork-thunk was creating a binary where .got.plt was placed at an address overlapping with .got. Reviewers: ruiu, grimar, rafael Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D41046 llvm-svn: 323856	2018-01-31 09:22:44 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Rafael Espindola	b5506e6baf	Rename --icf-data and add a corresponding flag for functions. When we have --icf=safe we should be able to define --icf=all as a shorthand for --icf=safe --ignore-function-address-equality. For now --ignore-function-address-equality is used only to control access to non preemptable symbols in shared libraries. llvm-svn: 322152	2018-01-10 01:37:36 +00:00
Peter Smith	96ca4f5e91	[ELF] Remove Duplicate .ARM.exidx sections The ARM.exidx section contains a table of 8-byte entries with the first word of each entry an offset to the function it describes and the second word instructions for unwinding if an exception is thrown from that function. The SHF_LINK_ORDER processing will order the table in ascending order of the functions described by the exception table entries. As the address range of an exception table entry is terminated by the next table entry, it is possible to merge consecutive table entries that have identical unwind instructions. For this implementation we define a table entry to be identical if: - Both entries are the special EXIDX_CANTUNWIND. - Both entries have the same inline unwind instructions. We do not attempt to establish if table entries that are references to .ARM.extab sections are identical. This implementation works at a granularity of a single .ARM.exidx InputSection. If all entries in the InputSection are identical to the previous table entry we can remove the InputSection. A more sophisticated but more complex implementation would rewrite InputSection contents so that duplicates within a .ARM.exidx InputSection can be merged. Differential Revision: https://reviews.llvm.org/D40967 llvm-svn: 320803	2017-12-15 11:09:41 +00:00
Rafael Espindola	814ece6854	Add an option for ICFing data. An internal linker has support for merging identical data and in some cases it can be a significant win. This is behind an off by default flag so it has to be requested explicitly. llvm-svn: 320448	2017-12-12 01:36:24 +00:00
Peter Smith	732cd8cbef	[ELF] Implement scanner for Cortex-A53 Erratum 843419 Add a new file AArch64ErrataFix.cpp that implements the logic to scan for the Cortex-A53 Erratum 843419. This involves finding all the executable code, disassembling the instructions that might trigger the erratum and reporting a message if the sequence is detected. At this stage we do not attempt to fix the erratum, this functionality will be added in a later patch. See D36749 for proposal. Differential Revision: https://reviews.llvm.org/D36742 llvm-svn: 319780	2017-12-05 15:59:05 +00:00
Peter Smith	57eb046984	[ELF] Read ARM BuildAttributes section to determine supported features. lld assumes some ARM features that are not available in all Arm processors. In particular: - The blx instruction present for interworking. - The movt/movw instructions are used in Thunks. - The J1=1 J2=1 encoding of branch immediates to improve Thumb wide branch range are assumed to be present. This patch reads the ARM Attributes section to check for the architecture the object file was compiled with. If none of the objects have an architecture that supports either of these features a warning will be given. This is most likely to affect armv6 as used in the first Raspberry Pi. Differential Revision: https://reviews.llvm.org/D36823 llvm-svn: 319169	2017-11-28 13:51:48 +00:00
Rui Ueyama	f1f00841d9	Merge SymbolBody and Symbol into one class, SymbolBody. SymbolBody and Symbol were separated classes due to a historical reason. Symbol used to be a pointer to a SymbolBody, and the relationship between Symbol and SymbolBody was n:1. r2681780 changed that. Since that patch, SymbolBody and Symbol are allocated next to each other to improve memory locality, and they have 1:1 relationship now. So, the separation of Symbol and SymbolBody no longer makes sense. This patch merges them into one class. In order to avoid updating too many places, I chose SymbolBody as a unified name. I'll rename it Symbol in a follow-up patch. Differential Revision: https://reviews.llvm.org/D39406 llvm-svn: 317006	2017-10-31 16:07:41 +00:00

1 2 3 4 5 ...

396 Commits