Address @ruiu's post commit review comment about a value which is intended
to be a unsigned 32 bit integer as using uint32_t rather than unsigned.
llvm-svn: 325713
This patch provides migitation for CVE-2017-5715, Spectre variant two,
which affects the P5600 and P6600. It implements the LLD part of
-z hazardplt. Like the Clang part of this patch, I have opted for that
specific option name in case alternative migitation methods are required
in the future.
The mitigation strategy suggested by MIPS for these processors is to use
hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard
barrier variants of the 'jalr' and 'jr' instructions respectively.
These instructions impede the execution of instruction stream until
architecturally defined hazards (changes to the instruction stream,
privileged registers which may affect execution) are cleared. These
instructions in MIPS' designs are not speculated past.
These instructions are defined by the MIPS32R2 ISA, so this mitigation
method is not compatible with processors which implement an earlier
revision of the MIPS ISA.
For LLD, this changes PLT stubs to use 'jalr.hb' and 'jr.hb'.
Reviewers: atanasyan, ruiu
Differential Revision: https://reviews.llvm.org/D43488
llvm-svn: 325647
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
We need to decompose relocation type for N32 / N64 ABI. Let's do it
before any other manipulations with relocation type in the `relocateOne`
routine.
llvm-svn: 322860
This is an aesthetic change to represent a placeholder for later
binary patching as "0, 0, 0, 0" instead of "0x00, 0x00, 0x00, 0x00".
The former is how we represent it in COFF, and I found it easier to
read than the latter.
llvm-svn: 321471
A more efficient PLT sequence can be used when the distance between the
.plt and the end of the .plt.got is less than 128 Megabytes, which is
frequently true. We fall back to the old sequence when the offset is larger
than 128 Megabytes. This gives us an alternative to forcing the longer
entries with --long-plt as we gracefully fall back to it as needed.
See ELF for the ARM Architecture Appendix A for details of the PLT sequence.
Differential Revision: https://reviews.llvm.org/D41246
llvm-svn: 320987
The PPC port doesn't support PLT yet, but the architecture independent
code optimizes PLT access for non preemptible symbols, which is
exactly what returning R_PC was trying to implement.
llvm-svn: 320430
Summary:
I also changed the message to print both the ISA and the the architecture
name for incompatible files. Previously it would be quite hard to find the
actual path of the incompatible object files in projects that have many
object files with the same name in different directories.
Reviewers: atanasyan, ruiu
Reviewed By: atanasyan
Subscribers: emaste, sdardis, llvm-commits
Differential Revision: https://reviews.llvm.org/D40958
llvm-svn: 320056
The AArch64 unconditional branch and branch and link instructions have a
maximum range of 128 Mib. This is usually enough for most programs but
there are cases when it isn't enough. This change adds support for range
extension thunks to AArch64. For pc-relative thunks we follow the small
code model and use ADRP, ADD, BR. This has a limit of 4 gigabytes.
Differential Revision: https://reviews.llvm.org/D39744
llvm-svn: 319307
When an undefined weak reference has a PLT entry we must generate a range
extension thunk for any B or BL that can't reach the PLT entry.
This change explicitly looks for whether a PLT entry exists rather than
assuming that weak references never need PLT entries unless Config->Shared
is in operation. This covers the case where we are linking an executable
with dynamic linking, hence a PLT entry will be needed for undefined weak
references. This case comes up in real programs over 32 Mb in size as there
is a B to a weak reference __gmon__start__ in the Arm crti.o for glibc.
Differential Revision: https://reviews.llvm.org/D40248
llvm-svn: 319020
microMIPS symbols including microMIPS PLT records created for regular
symbols needs to be marked by STO_MIPS_MICROMIPS flag in a symbol table.
Additionally microMIPS entries in a dynamic symbol table should have
configured less-significant bit. That allows to escape teaching a
dynamic linker about microMIPS symbols.
llvm-svn: 318097
We have a lot of "if (MIPS)" conditions in lld because the MIPS' ABI
is different at various places than other arch's ABIs at where it
don't have to be different, but we at least want to reduce MIPS-ness
from the regular classes.
llvm-svn: 317525
Now that DefinedRegular is the only remaining derived class of
Defined, we can merge the two classes.
Differential Revision: https://reviews.llvm.org/D39667
llvm-svn: 317448
Now that we have only SymbolBody as the symbol class. So, "SymbolBody"
is a bit strange name now. This is a mechanical change generated by
perl -i -pe s/SymbolBody/Symbol/g $(git grep -l SymbolBody lld/ELF lld/COFF)
nd clang-format-diff.
Differential Revision: https://reviews.llvm.org/D39459
llvm-svn: 317370
This change adds initial support for range extension thunks. All thunks must
be created within the first pass so some corner cases are not supported. A
follow up patch will add support for multiple passes.
With this change the existing tests arm-branch-error.s and
arm-thumb-branch-error.s now no longer fail with an out of range branch.
These have been renamed and tests added for the range extension thunk.
Differential Revision: https://reviews.llvm.org/D34691
llvm-svn: 316752
When an OutputSection is larger than the branch range for a Target we
need to place thunks such that they are always in range of their caller,
and sufficiently spaced to maximise the number of callers that can use
the thunk. We use the simple heuristic of placing the
ThunkSection at intervals corresponding to a target specific branch range.
If the OutputSection is small we put the thunks at the end of the executable
sections.
Differential Revision: https://reviews.llvm.org/D34689
llvm-svn: 316751
Summary:
The COFF linker and the ELF linker have long had similar but separate
Error.h and Error.cpp files to implement error handling. This change
introduces new error handling code in Common/ErrorHandler.h, changes the
COFF and ELF linkers to use it, and removes the old, separate
implementations.
Reviewers: ruiu
Reviewed By: ruiu
Subscribers: smeenai, jyknight, emaste, sdardis, nemanjai, nhaehnle, mgorny, javed.absar, kbarton, fedor.sergeev, llvm-commits
Differential Revision: https://reviews.llvm.org/D39259
llvm-svn: 316624
The support of R_PPC_ADDR16_HI improves ld compatibility and makes
things on par with RuntimeDyldELF that already implements this
relocation.
Patch by vit9696.
llvm-svn: 316306
static __global int Var = 0;
__global int* Ptr[] = {&Var};
...
In this case Var is a non premptable symbol and so its address can be used as the value of Ptr, with a base relative relocation that will add the delta between the ELF address and the actual load address. Such relocations do not require a symbol.
This also fixes LLD which was incorrectly generating a PCREL64 for this case.
Differential Revision: https://reviews.llvm.org/D38910
llvm-svn: 315936
These are generated by the linker itself and it shouldn't treat
them as unrecognized. This was introduced in r315552 and is triggering
an error when building UBSan shared library for i386.
Differential Revision: https://reviews.llvm.org/D38899
llvm-svn: 315737
A section was passed to getRelExpr just to create an error message.
But if there's an invalid relocation, we would eventually report it
in relocateOne. So we don't have to pass a section to getRelExpr.
llvm-svn: 315552
We were using uint32_t as the type of relocation kind. It has a
readability issue because what Type really means in `uint32_t Type`
is not obvious. It could be a section type, a symbol type or a
relocation type.
Since we do not do any arithemetic operations on relocation types
(e.g. adding one to R_X86_64_PC32 doesn't make sense), it would be
more natural if they are represented as enums. Unfortunately, that
is not doable because relocation type definitions are spread into
multiple header files.
So I decided to use typedef. This still should be better than the
plain uint32_t because the intended type is now obvious.
llvm-svn: 315525
Summary:
These are 16 bit relocations and not part of a HI/LO pair so we need to
check that they don't overflow.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: ruiu, llvm-commits, emaste, sdardis
Tags: #lld
Differential Revision: https://reviews.llvm.org/D38614
llvm-svn: 315073
If symbol has the STO_MIPS_MICROMIPS flag and requires a thunk to perform
call PIC from non-PIC functions, we need to generate a thunk with microMIPS
code.
llvm-svn: 314797
Currently LLD calls the `isMicroMips` routine to determine type of PLT entries
needs to be generated: regular or microMIPS. This routine checks ELF
header flags in the `FirstObj` to retrieve type of linked object files.
So if the first file does not contain microMIPS code, LLD will generate
PLT entries with regular (non-microMIPS) code only.
Ideally, if a PLT entry is referenced by microMIPS code only this entry
should contain microMIPS code, if a PLT entry is referenced by regular
code this entry should contain regular code. In a "mixed" case the PLT
entry can be either microMIPS or regular, but each "cross-mode-call" has
additional cost.
It's rather difficult to implement this ideal solution. But we can
assume that if there is an input object file with microMIPS code, the
most part of the code is microMIPS too. So we need to deduce type of PLT
entries based on finally calculated ELF header flags and do not check
only the first input object file.
This change implements this.
- The `getMipsEFlags` renamed to the `calcMipsEFlags`. The function
called from the `LinkerDriver::link`. Result is stored in
the Configuration::MipsEFlags field.
- The `isMicroMips` and `isMipsR6` routines access the `MipsEFlags`
field to get and check calculated ELF flags.
- New types of PLT records created when necessary.
Differential revision: https://reviews.llvm.org/D37747
llvm-svn: 314675
This patch removes lot of static Instances arrays from different input file
classes and introduces global arrays for access instead. Similar to arrays we
have for InputSections/OutputSectionCommands.
It allows to iterate over input files in a non-templated code.
Differential revision: https://reviews.llvm.org/D35987
llvm-svn: 313619
The patch implements initial support of microMIPS code linking:
- Handle microMIPS specific relocations.
- Emit both R1-R5 and R6 microMIPS PLT records.
For now linking mixed set of regular and microMIPS object files is not
supported. Also the patch does not handle (setup and clear) the
least-significant bit of an address which is utilized as the ISA mode
bit and allows to make jump between regular and microMIPS code without
any thunks.
Differential revision: https://reviews.llvm.org/D37335
llvm-svn: 313028
To support errata patching on AArch64 we need to be able to overwrite
an arbitrary instruction with a branch. For AArch64 it is sufficient to
always write all the bits of the branch instruction and not just the
immediate field. This is safe as the non-immediate bits of the branch
instruction are always the same.
Differential Revision: https://reviews.llvm.org/D36745
llvm-svn: 312727
The R_AARCH64_LDST<N>_ABS LO12_NC relocations where N is 8, 16, 32, 64 or
128 have a scaled immediate. For example R_AARCH64_LDST32_ABS_LO12_NC
shifts the calculated value right by 4. If the target symbol + relocation
addend is not aligned properly then bits of the answer will be lost.
This change adds an alignment check to the relocations to make sure the
target of the relocation is aligned properly. This matches the behavior of
GNU ld. The motivation is to catch ODR violations such as a declaration of
extern int foo, but a definition of bool foo as the compiler may use
R_AARCH64_LDST32_ABS_LO12_NC for the former, but not align the destination.
Differential Revision: https://reviews.llvm.org/D37444
llvm-svn: 312637
Pass BSIZE and SHIFT as a function arguments to the `writeRelocation`
routine. It does not make a sense to have so many `writeRelocation's`
instances.
llvm-svn: 312495