llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	87de9a0786	[X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form ``` // llvm-objdump -d output (before) 400000: e8 0b 00 00 00 callq 11 400005: e8 0b 00 00 00 callq 11 // llvm-objdump -d output (after) 400000: e8 0b 00 00 00 callq 0x400010 400005: e8 0b 00 00 00 callq 0x400015 // GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled 400000: e8 0b 00 00 00 callq 400010 400005: e8 0b 00 00 00 callq 400015 ``` In llvm-objdump, we pass the address of the next MCInst. Ideally we should just thread the address of the current address, unfortunately we cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter requires MCInstrInfo and MCContext) to get the length of the MCInst. MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0. They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D76580	2020-03-26 08:28:59 -07:00
Fangrui Song	71e2ca6e32	[llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier. `.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive. Without `-LABEL`, the CHECK line can match the `Disassembly of section` line and causes the next `CHECK-NEXT:` to fail. ``` Disassembly of section .foo: 0000000000001634 .foo: ``` Bdragon: <> has metalinguistic connotation. it just "feels right" Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75713	2020-03-05 18:05:28 -08:00
Jordan Rupprecht	9d4a6b1bb2	[llvm-objdump] Further rearrange llvm-objdump sections for compatability Summary: rL371826 rearranged some output from llvm-objdump for GNU objdump compatability, but there still seem to be some more. I think this rearrangement is a little closer. Overview of the ordering which matches GNU objdump: * Archive headers * File headers * Section headers * Symbol table * Dwarf debugging * Relocations (if `--disassemble` is not used) * Section contents * Disassembly Reviewers: jhenderson, justice_adams, grimar, ychen, espindola Reviewed By: jhenderson Subscribers: aprantl, emaste, arichardson, jrtc27, atanasyan, seiya, llvm-commits, MaskRay Tags: #llvm Differential Revision: https://reviews.llvm.org/D68066 llvm-svn: 373671	2019-10-03 22:01:08 +00:00
Fangrui Song	9c371309f3	[ELF][X86] Allow PT_LOAD to have overlapping p_offset ranges on EM_386 Ported the D64906 technique to EM_386. If `sh_addralign(.tdata) < sh_addralign(.tbss)`, we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0`. ld.so that are known to have problems if p_vaddr%p_align!=0: * FreeBSD 13.0-CURRENT rtld-elf * glibc https://sourceware.org/bugzilla/show_bug.cgi?id=24606 New test i386-tls-vaddr-align.s checks our workaround makes p_vaddr%p_align = 0. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D65865 llvm-svn: 369347	2019-08-20 08:43:47 +00:00
Fangrui Song	0cfa72eaec	[ELF][X86] Improve tests * Add --no-show-raw-insn to llvm-objdump -d tests * When linking an executable with %t.so, the path %t.so will be recorded in the DT_NEEDED entry if %t.so doesn't have DT_SONAME. .dynstr will have varying lengths on different systems. Add -soname so that the string in .dynstr is of fixed length to make tests more robust. * Rename i386-tls-initial-exec-local.s to i386-tls-ie-local.s * Refactor tls-initial-exec-local.s to x86-64-tls-ie-local.s llvm-svn: 367533	2019-08-01 09:25:34 +00:00
Fangrui Song	5387c2cd17	[llvm-objdump] Print newlines before and after "Disassembly of section ...:" This improves readability and the behavior is consistent with GNU objdump. The new test test/tools/llvm-objdump/X86/disassemble-section-name.s checks we print newlines before and after "Disassembly of section ...:" Differential Revision: https://reviews.llvm.org/D61127 llvm-svn: 359668	2019-05-01 10:40:48 +00:00
Fangrui Song	07f8daf05e	[ELF] Simplify RelRo, TLS, NOBITS section ranks and make RW PT_LOAD start with RelRo Old: PT_LOAD(.data \| PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) \| .bss) New: PT_LOAD(PT_GNU_RELRO(.data.rel.ro .bss.rel.ro) \| .data .bss) The placement of \| indicates page alignment caused by PT_GNU_RELRO. The new layout has simpler rules and saves space for many cases. Old size: roundup(.data) + roundup(.data.rel.ro) New size: roundup(.data.rel.ro + .bss.rel.ro) + .data Other advantages: * At runtime the 3 memory mappings decrease to 2. * start(PT_TLS) = start(PT_GNU_RELRO) = start(RW PT_LOAD). This simplifies binary manipulation tools. GNU strip before 2.31 discards PT_GNU_RELRO if its address is not equal to the start of its associated PT_LOAD. This has been fixed by https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=f2731e0c374e5323ce4cdae2bcc7b7fe22da1a6f But with this change, we will be compatible with GNU strip before 2.31 * Before, .got.plt (non-relro by default) was placed before .got (relro by default), which made it impossible to have _GLOBAL_OFFSET_TABLE_ (start of .got.plt on x86-64) equal to the end of .got (R_GOT*_FROM_END) (https://bugs.llvm.org/show_bug.cgi?id=36555). With the new ordering, we can improve on this regard if we'd like to. Reviewers: ruiu, espindola, pcc Subscribers: emaste, arichardson, llvm-commits, joerg, jdoerfert Differential Revision: https://reviews.llvm.org/D56828 llvm-svn: 356117	2019-03-14 03:47:45 +00:00
Dimitry Andric	c9de3b4d26	Align AArch64 and i386 image base to superpage Summary: As for x86_64, the default image base for AArch64 and i386 should be aligned to a superpage appropriate for the architecture. On AArch64, this is 2 MiB, on i386 it is 4 MiB. Reviewers: emaste, grimar, javed.absar, espindola, ruiu, peter.smith, srhines, rprichard Reviewed By: ruiu, peter.smith Subscribers: jfb, markj, arichardson, krytarowski, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50297 llvm-svn: 342746	2018-09-21 16:58:13 +00:00
Rui Ueyama	971f87a806	Fix retpoline PLT header size for i386. Differential Revision: https://reviews.llvm.org/D42397 llvm-svn: 323288	2018-01-24 00:26:57 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00

10 Commits