forked from OSchip/llvm-project
194 lines
6.6 KiB
ReStructuredText
194 lines
6.6 KiB
ReStructuredText
===============
|
|
ShadowCallStack
|
|
===============
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Introduction
|
|
============
|
|
|
|
ShadowCallStack is an **experimental** instrumentation pass, currently only
|
|
implemented for x86_64 and aarch64, that protects programs against return
|
|
address overwrites (e.g. stack buffer overflows.) It works by saving a
|
|
function's return address to a separately allocated 'shadow call stack'
|
|
in the function prolog and checking the return address on the stack against
|
|
the shadow call stack in the function epilog.
|
|
|
|
Comparison
|
|
----------
|
|
|
|
To optimize for memory consumption and cache locality, the shadow call stack
|
|
stores an index followed by an array of return addresses. This is in contrast
|
|
to other schemes, like :doc:`SafeStack`, that mirror the entire stack and
|
|
trade-off consuming more memory for shorter function prologs and epilogs with
|
|
fewer memory accesses. Similarly, `Return Flow Guard`_ consumes more memory with
|
|
shorter function prologs and epilogs than ShadowCallStack but suffers from the
|
|
same race conditions (see `Security`_). Intel `Control-flow Enforcement Technology`_
|
|
(CET) is a proposed hardware extension that would add native support to
|
|
use a shadow stack to store/check return addresses at call/return time. It
|
|
would not suffer from race conditions at calls and returns and not incur the
|
|
overhead of function instrumentation, but it does require operating system
|
|
support.
|
|
|
|
.. _`Return Flow Guard`: https://xlab.tencent.com/en/2016/11/02/return-flow-guard/
|
|
.. _`Control-flow Enforcement Technology`: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
|
|
|
|
Compatibility
|
|
-------------
|
|
|
|
ShadowCallStack currently only supports x86_64 and aarch64. A runtime is not
|
|
currently provided in compiler-rt so one must be provided by the compiled
|
|
application.
|
|
|
|
On aarch64, the instrumentation makes use of the platform register ``x18``.
|
|
On some platforms, ``x18`` is reserved, and on others, it is designated as
|
|
a scratch register. This generally means that any code that may run on the
|
|
same thread as code compiled with ShadowCallStack must either target one
|
|
of the platforms whose ABI reserves ``x18`` (currently Darwin, Fuchsia and
|
|
Windows) or be compiled with the flag ``-ffixed-x18``.
|
|
|
|
Security
|
|
========
|
|
|
|
ShadowCallStack is intended to be a stronger alternative to
|
|
``-fstack-protector``. It protects from non-linear overflows and arbitrary
|
|
memory writes to the return address slot; however, similarly to
|
|
``-fstack-protector`` this protection suffers from race conditions because of
|
|
the call-return semantics on x86_64. There is a short race between the call
|
|
instruction and the first instruction in the function that reads the return
|
|
address where an attacker could overwrite the return address and bypass
|
|
ShadowCallStack. Similarly, there is a time-of-check-to-time-of-use race in the
|
|
function epilog where an attacker could overwrite the return address after it
|
|
has been checked and before it has been returned to. Modifying the call-return
|
|
semantics to fix this on x86_64 would incur an unacceptable performance overhead
|
|
due to return branch prediction.
|
|
|
|
The instrumentation makes use of the ``gs`` segment register on x86_64,
|
|
or the ``x18`` register on aarch64, to reference the shadow call stack
|
|
meaning that references to the shadow call stack do not have to be stored in
|
|
memory. This makes it possible to implement a runtime that avoids exposing
|
|
the address of the shadow call stack to attackers that can read arbitrary
|
|
memory. However, attackers could still try to exploit side channels exposed
|
|
by the operating system `[1]`_ `[2]`_ or processor `[3]`_ to discover the
|
|
address of the shadow call stack.
|
|
|
|
.. _`[1]`: https://eyalitkin.wordpress.com/2017/09/01/cartography-lighting-up-the-shadows/
|
|
.. _`[2]`: https://www.blackhat.com/docs/eu-16/materials/eu-16-Goktas-Bypassing-Clangs-SafeStack.pdf
|
|
.. _`[3]`: https://www.vusec.net/projects/anc/
|
|
|
|
On x86_64, leaf functions are optimized to store the return address in a
|
|
free register and avoid writing to the shadow call stack if a register is
|
|
available. Very short leaf functions are uninstrumented if their execution
|
|
is judged to be shorter than the race condition window intrinsic to the
|
|
instrumentation.
|
|
|
|
On aarch64, the architecture's call and return instructions (``bl`` and
|
|
``ret``) operate on a register rather than the stack, which means that
|
|
leaf functions are generally protected from return address overwrites even
|
|
without ShadowCallStack. It also means that ShadowCallStack on aarch64 is not
|
|
vulnerable to the same types of time-of-check-to-time-of-use races as x86_64.
|
|
|
|
Usage
|
|
=====
|
|
|
|
To enable ShadowCallStack, just pass the ``-fsanitize=shadow-call-stack``
|
|
flag to both compile and link command lines. On aarch64, you also need to pass
|
|
``-ffixed-x18`` unless your target already reserves ``x18``.
|
|
|
|
Low-level API
|
|
-------------
|
|
|
|
``__has_feature(shadow_call_stack)``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
In some cases one may need to execute different code depending on whether
|
|
ShadowCallStack is enabled. The macro ``__has_feature(shadow_call_stack)`` can
|
|
be used for this purpose.
|
|
|
|
.. code-block:: c
|
|
|
|
#if defined(__has_feature)
|
|
# if __has_feature(shadow_call_stack)
|
|
// code that builds only under ShadowCallStack
|
|
# endif
|
|
#endif
|
|
|
|
``__attribute__((no_sanitize("shadow-call-stack")))``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Use ``__attribute__((no_sanitize("shadow-call-stack")))`` on a function
|
|
declaration to specify that the shadow call stack instrumentation should not be
|
|
applied to that function, even if enabled globally.
|
|
|
|
Example
|
|
=======
|
|
|
|
The following example code:
|
|
|
|
.. code-block:: c++
|
|
|
|
int foo() {
|
|
return bar() + 1;
|
|
}
|
|
|
|
Generates the following x86_64 assembly when compiled with ``-O2``:
|
|
|
|
.. code-block:: gas
|
|
|
|
push %rax
|
|
callq foo
|
|
add $0x1,%eax
|
|
pop %rcx
|
|
retq
|
|
|
|
or the following aarch64 assembly:
|
|
|
|
.. code-block:: none
|
|
|
|
stp x29, x30, [sp, #-16]!
|
|
mov x29, sp
|
|
bl bar
|
|
add w0, w0, #1
|
|
ldp x29, x30, [sp], #16
|
|
ret
|
|
|
|
|
|
Adding ``-fsanitize=shadow-call-stack`` would output the following x86_64
|
|
assembly:
|
|
|
|
.. code-block:: gas
|
|
|
|
mov (%rsp),%r10
|
|
xor %r11,%r11
|
|
addq $0x8,%gs:(%r11)
|
|
mov %gs:(%r11),%r11
|
|
mov %r10,%gs:(%r11)
|
|
push %rax
|
|
callq foo
|
|
add $0x1,%eax
|
|
pop %rcx
|
|
xor %r11,%r11
|
|
mov %gs:(%r11),%r10
|
|
mov %gs:(%r10),%r10
|
|
subq $0x8,%gs:(%r11)
|
|
cmp %r10,(%rsp)
|
|
jne trap
|
|
retq
|
|
|
|
trap:
|
|
ud2
|
|
|
|
or the following aarch64 assembly:
|
|
|
|
.. code-block:: none
|
|
|
|
str x30, [x18], #8
|
|
stp x29, x30, [sp, #-16]!
|
|
mov x29, sp
|
|
bl bar
|
|
add w0, w0, #1
|
|
ldp x29, x30, [sp], #16
|
|
ldr x30, [x18, #-8]!
|
|
ret
|