llvm-project/clang/docs/HardwareAssistedAddressSani...

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

288 lines
12 KiB
ReStructuredText
Raw Normal View History

=======================================================
Hardware-assisted AddressSanitizer Design Documentation
=======================================================
This page is a design document for
**hardware-assisted AddressSanitizer** (or **HWASAN**)
a tool similar to :doc:`AddressSanitizer`,
but based on partial hardware assistance.
Introduction
============
:doc:`AddressSanitizer`
tags every 8 bytes of the application memory with a 1 byte tag (using *shadow memory*),
uses *redzones* to find buffer-overflows and
*quarantine* to find use-after-free.
The redzones, the quarantine, and, to a less extent, the shadow, are the
sources of AddressSanitizer's memory overhead.
See the `AddressSanitizer paper`_ for details.
AArch64 has the `Address Tagging`_ (or top-byte-ignore, TBI), a hardware feature that allows
software to use 8 most significant bits of a 64-bit pointer as
a tag. HWASAN uses `Address Tagging`_
to implement a memory safety tool, similar to :doc:`AddressSanitizer`,
but with smaller memory overhead and slightly different (mostly better)
accuracy guarantees.
Algorithm
=========
2018-03-14 09:55:49 +08:00
* Every heap/stack/global memory object is forcibly aligned by `TG` bytes
(`TG` is e.g. 16 or 64). We call `TG` the **tagging granularity**.
* For every such object a random `TS`-bit tag `T` is chosen (`TS`, or tag size, is e.g. 4 or 8)
* The pointer to the object is tagged with `T`.
2018-03-14 09:55:49 +08:00
* The memory for the object is also tagged with `T` (using a `TG=>1` shadow memory)
* Every load and store is instrumented to read the memory tag and compare it
with the pointer tag, exception is raised on tag mismatch.
2018-03-14 09:55:49 +08:00
For a more detailed discussion of this approach see https://arxiv.org/pdf/1802.09517.pdf
hwasan: Improve precision of checks using short granule tags. A short granule is a granule of size between 1 and `TG-1` bytes. The size of a short granule is stored at the location in shadow memory where the granule's tag is normally stored, while the granule's actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities: * the pointer tag is equal to the memory tag in shadow memory, or * the shadow memory tag is actually a short granule size, the value being loaded is in bounds of the granule and the pointer tag is equal to the last byte of the granule. Pointer tags between 1 to `TG-1` are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and `TG-1` and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule). When HWASAN detects an error near a memory tag between 1 and `TG-1`, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities. Because this functionality obsoletes the right aligned heap feature of the HWASAN memory allocator (and because we can no longer easily test it), the feature is removed. Also update the documentation to cover both short granule tags and outlined checks. Differential Revision: https://reviews.llvm.org/D63908 llvm-svn: 365551
2019-07-10 04:22:36 +08:00
Short granules
--------------
A short granule is a granule of size between 1 and `TG-1` bytes. The size
of a short granule is stored at the location in shadow memory where the
granule's tag is normally stored, while the granule's actual tag is stored
in the last byte of the granule. This means that in order to verify that a
pointer tag matches a memory tag, HWASAN must check for two possibilities:
* the pointer tag is equal to the memory tag in shadow memory, or
* the shadow memory tag is actually a short granule size, the value being loaded
is in bounds of the granule and the pointer tag is equal to the last byte of
the granule.
Pointer tags between 1 to `TG-1` are possible and are as likely as any other
tag. This means that these tags in memory have two interpretations: the full
tag interpretation (where the pointer tag is between 1 and `TG-1` and the
last byte of the granule is ordinary data) and the short tag interpretation
(where the pointer tag is stored in the granule).
When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
will show both the memory tag and the last byte of the granule. Currently,
it is up to the user to disambiguate the two possibilities.
Instrumentation
===============
Memory Accesses
---------------
In the majority of cases, memory accesses are prefixed with a call to
an outlined instruction sequence that verifies the tags. The code size
and performance overhead of the call is reduced by using a custom calling
convention that
* preserves most registers, and
* is specialized to the register containing the address, and the type and
size of the memory access.
Currently, the following sequence is used:
.. code-block:: none
// int foo(int *a) { return *a; }
// clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - load.c
[...]
foo:
AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422
2020-10-28 03:10:08 +08:00
stp x30, x20, [sp, #-16]!
adrp x20, :got:__hwasan_shadow // load shadow address from GOT into x20
ldr x20, [x20, :got_lo12:__hwasan_shadow]
bl __hwasan_check_x0_2_short_v2 // call outlined tag check
// (arguments: x0 = address, x20 = shadow base;
// "2" encodes the access type and size)
ldr w0, [x0] // inline load
AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422
2020-10-28 03:10:08 +08:00
ldp x30, x20, [sp], #16
ret
[...]
AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422
2020-10-28 03:10:08 +08:00
__hwasan_check_x0_2_short_v2:
sbfx x16, x0, #4, #52 // shadow offset
AArch64: Switch to x20 as the shadow base register for outlined HWASan checks. From a code size perspective it turns out to be better to use a callee-saved register to pass the shadow base. For non-leaf functions it avoids the need to reload the shadow base into x9 after each function call, at the cost of an additional stack slot to save the caller's x20. But with x9 there is also a stack size cost, either as a result of copying x9 to a callee-saved register across calls or by spilling it to stack, so for the non-leaf functions the change to stack usage is largely neutral. It is also code size (and stack size) neutral for many leaf functions. Although they now need to save/restore x20 this can typically be combined via LDP/STP into the x30 save/restore. In the case where the function needs callee-saved registers or stack spills we end up needing, on average, 8 more bytes of stack and 1 more instruction but given the improvements to other functions this seems like the right tradeoff. Unfortunately we cannot change the register for the v1 (non short granules) check because the runtime assumes that the shadow base register is stored in x9, so the v1 check still uses x9. Aside from that there is no change to the ABI because the choice of shadow base register is a contract between the caller and the outlined check function, both of which are compiler generated. We do need to rename the v2 check functions though because the functions are deduplicated based on their names, not on their contents, and we need to make sure that when object files from old and new compilers are linked together we don't end up with a function that uses x9 calling an outlined check that uses x20 or vice versa. With this change code size of /system/lib64/*.so in an Android build with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3% decrease. Differential Revision: https://reviews.llvm.org/D90422
2020-10-28 03:10:08 +08:00
ldrb w16, [x20, x16] // load shadow tag
cmp x16, x0, lsr #56 // extract address tag, compare with shadow tag
b.ne .Ltmp0 // jump to short tag handler on mismatch
.Ltmp1:
ret
.Ltmp0:
cmp w16, #15 // is this a short tag?
b.hi .Ltmp2 // if not, error
and x17, x0, #0xf // find the address's position in the short granule
add x17, x17, #3 // adjust to the position of the last byte loaded
cmp w16, w17 // check that position is in bounds
b.ls .Ltmp2 // if not, error
orr x16, x0, #0xf // compute address of last byte of granule
ldrb w16, [x16] // load tag from it
cmp x16, x0, lsr #56 // compare with pointer tag
b.eq .Ltmp1 // if matches, continue
.Ltmp2:
stp x0, x1, [sp, #-256]! // save original x0, x1 on stack (they will be overwritten)
stp x29, x30, [sp, #232] // create frame record
mov x1, #2 // set x1 to a constant indicating the type of failure
adrp x16, :got:__hwasan_tag_mismatch_v2 // call runtime function to save remaining registers and report error
ldr x16, [x16, :got_lo12:__hwasan_tag_mismatch_v2] // (load address from GOT to avoid potential register clobbers in delay load handler)
br x16
Heap
----
Tagging the heap memory/pointers is done by `malloc`.
2018-03-14 09:55:49 +08:00
This can be based on any malloc that forces all objects to be TG-aligned.
`free` tags the memory with a different tag.
Stack
-----
Stack frames are instrumented by aligning all non-promotable allocas
2018-03-14 09:55:49 +08:00
by `TG` and tagging stack memory in function prologue and epilogue.
Tags for different allocas in one function are **not** generated
independently; doing that in a function with `M` allocas would require
maintaining `M` live stack pointers, significantly increasing register
pressure. Instead we generate a single base tag value in the prologue,
and build the tag for alloca number `M` as `ReTag(BaseTag, M)`, where
ReTag can be as simple as exclusive-or with constant `M`.
Stack instrumentation is expected to be a major source of overhead,
but could be optional.
Globals
-------
Most globals in HWASAN instrumented code are tagged. This is accomplished
using the following mechanisms:
* The address of each global has a static tag associated with it. The first
defined global in a translation unit has a pseudorandom tag associated
with it, based on the hash of the file path. Subsequent global tags are
incremental from the previously-assigned tag.
* The global's tag is added to its symbol address in the object file's symbol
table. This causes the global's address to be tagged when its address is
taken.
* When the address of a global is taken directly (i.e. not via the GOT), a special
instruction sequence needs to be used to add the tag to the address,
because the tag would otherwise take the address outside of the small code
model (4GB on AArch64). No changes are required when the address is taken
via the GOT because the address stored in the GOT will contain the tag.
* An associated ``hwasan_globals`` section is emitted for each tagged global,
which indicates the address of the global, its size and its tag. These
sections are concatenated by the linker into a single ``hwasan_globals``
section that is enumerated by the runtime (via an ELF note) when a binary
is loaded and the memory is tagged accordingly.
A complete example is given below:
.. code-block:: none
// int x = 1; int *f() { return &x; }
// clang -O2 --target=aarch64-linux-android30 -fsanitize=hwaddress -S -o - global.c
[...]
f:
adrp x0, :pg_hi21_nc:x // set bits 12-63 to upper bits of untagged address
movk x0, #:prel_g3:x+0x100000000 // set bits 48-63 to tag
add x0, x0, :lo12:x // set bits 0-11 to lower bits of address
ret
[...]
.data
.Lx.hwasan:
.word 1
.globl x
.set x, .Lx.hwasan+0x2d00000000000000
[...]
.section .note.hwasan.globals,"aG",@note,hwasan.module_ctor,comdat
.Lhwasan.note:
.word 8 // namesz
.word 8 // descsz
.word 3 // NT_LLVM_HWASAN_GLOBALS
.asciz "LLVM\000\000\000"
.word __start_hwasan_globals-.Lhwasan.note
.word __stop_hwasan_globals-.Lhwasan.note
[...]
.section hwasan_globals,"ao",@progbits,.Lx.hwasan,unique,2
.Lx.hwasan.descriptor:
.word .Lx.hwasan-.Lx.hwasan.descriptor
.word 0x2d000004 // tag = 0x2d, size = 4
Error reporting
---------------
Errors are generated by the `HLT` instruction and are handled by a signal handler.
Attribute
---------
2017-12-09 02:14:03 +08:00
HWASAN uses its own LLVM IR Attribute `sanitize_hwaddress` and a matching
C function attribute. An alternative would be to re-use ASAN's attribute
`sanitize_address`. The reasons to use a separate attribute are:
* Users may need to disable ASAN but not HWASAN, or vise versa,
because the tools have different trade-offs and compatibility issues.
* LLVM (ideally) does not use flags to decide which pass is being used,
ASAN or HWASAN are being applied, based on the function attributes.
This does mean that users of HWASAN may need to add the new attribute
to the code that already uses the old attribute.
Comparison with AddressSanitizer
================================
HWASAN:
* Is less portable than :doc:`AddressSanitizer`
as it relies on hardware `Address Tagging`_ (AArch64).
Address Tagging can be emulated with compiler instrumentation,
but it will require the instrumentation to remove the tags before
any load or store, which is infeasible in any realistic environment
that contains non-instrumented code.
* May have compatibility problems if the target code uses higher
pointer bits for other purposes.
* May require changes in the OS kernels (e.g. Linux seems to dislike
tagged pointers passed from address space:
https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt).
* **Does not require redzones to detect buffer overflows**,
but the buffer overflow detection is probabilistic, with roughly
`1/(2**TS)` chance of missing a bug (6.25% or 0.39% with 4 and 8-bit TS
respectively).
* **Does not require quarantine to detect heap-use-after-free,
or stack-use-after-return**.
The detection is similarly probabilistic.
The memory overhead of HWASAN is expected to be much smaller
than that of AddressSanitizer:
2018-03-14 09:55:49 +08:00
`1/TG` extra memory for the shadow
and some overhead due to `TG`-aligning all objects.
Supported architectures
=======================
HWASAN relies on `Address Tagging`_ which is only available on AArch64.
For other 64-bit architectures it is possible to remove the address tags
before every load and store by compiler instrumentation, but this variant
will have limited deployability since not all of the code is
typically instrumented.
The HWASAN's approach is not applicable to 32-bit architectures.
Related Work
============
* `SPARC ADI`_ implements a similar tool mostly in hardware.
* `Effective and Efficient Memory Protection Using Dynamic Tainting`_ discusses
similar approaches ("lock & key").
* `Watchdog`_ discussed a heavier, but still somewhat similar
"lock & key" approach.
* *TODO: add more "related work" links. Suggestions are welcome.*
.. _Watchdog: https://www.cis.upenn.edu/acg/papers/isca12_watchdog.pdf
.. _Effective and Efficient Memory Protection Using Dynamic Tainting: https://www.cc.gatech.edu/~orso/papers/clause.doudalis.orso.prvulovic.pdf
.. _SPARC ADI: https://lazytyped.blogspot.com/2017/09/getting-started-with-adi.html
.. _AddressSanitizer paper: https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf
.. _Address Tagging: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/ch12s05s01.html