llvm-project/compiler-rt/test/dfsan/pair.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

170 lines
4.5 KiB
C++
Raw Normal View History

[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
// RUN: %clangxx_dfsan %s -mllvm -dfsan-fast-16-labels -mllvm -dfsan-track-select-control-flow=false -mllvm -dfsan-combine-pointer-labels-on-load=false -O0 -DO0 -o %t && %run %t
// RUN: %clangxx_dfsan %s -mllvm -dfsan-fast-16-labels -mllvm -dfsan-track-select-control-flow=false -mllvm -dfsan-combine-pointer-labels-on-load=false -O1 -o %t && %run %t
#include <algorithm>
#include <assert.h>
#include <sanitizer/dfsan_interface.h>
#include <utility>
__attribute__((noinline))
std::pair<int *, int>
make_pair(int *p, int i) { return {p, i}; }
__attribute__((noinline))
std::pair<int *, int>
copy_pair1(const std::pair<int *, int> &pair) {
return pair;
}
__attribute__((noinline))
std::pair<int *, int>
copy_pair2(std::pair<int *, int> *pair) {
return *pair;
}
__attribute__((noinline))
std::pair<int *, int>
copy_pair3(std::pair<int *, int> &&pair) {
return std::move(pair);
}
__attribute__((noinline))
std::pair<const char *, uint32_t>
return_ptr_and_i32(const char *p, uint32_t res) {
for (uint32_t i = 2; i < 5; i++) {
uint32_t byte = static_cast<uint8_t>(p[i]);
res += (byte - 1) << (7 * i);
if (byte < 128) {
return {p + i + 1, res};
}
}
return {nullptr, 0};
}
__attribute__((noinline))
std::pair<const char *, uint64_t>
return_ptr_and_i64(const char *p, uint32_t res32) {
uint64_t res = res32;
for (uint32_t i = 2; i < 10; i++) {
uint64_t byte = static_cast<uint8_t>(p[i]);
res += (byte - 1) << (7 * i);
if (byte < 128) {
return {p + i + 1, res};
}
}
return {nullptr, 0};
}
void test_simple_constructors() {
int i = 1;
int *ptr = NULL;
dfsan_set_label(8, &i, sizeof(i));
dfsan_set_label(2, &ptr, sizeof(ptr));
std::pair<int *, int> pair1 = make_pair(ptr, i);
int i1 = pair1.second;
int *ptr1 = pair1.first;
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#ifdef O0
assert(dfsan_read_label(&i1, sizeof(i1)) == 10);
assert(dfsan_read_label(&ptr1, sizeof(ptr1)) == 10);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#else
assert(dfsan_read_label(&i1, sizeof(i1)) == 8);
assert(dfsan_read_label(&ptr1, sizeof(ptr1)) == 2);
#endif
std::pair<int *, int> pair2 = copy_pair1(pair1);
int i2 = pair2.second;
int *ptr2 = pair2.first;
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#ifdef O0
assert(dfsan_read_label(&i2, sizeof(i2)) == 10);
assert(dfsan_read_label(&ptr2, sizeof(ptr2)) == 10);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#else
assert(dfsan_read_label(&i2, sizeof(i2)) == 8);
assert(dfsan_read_label(&ptr2, sizeof(ptr2)) == 2);
#endif
std::pair<int *, int> pair3 = copy_pair2(&pair1);
int i3 = pair3.second;
int *ptr3 = pair3.first;
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#ifdef O0
assert(dfsan_read_label(&i3, sizeof(i3)) == 10);
assert(dfsan_read_label(&ptr3, sizeof(ptr3)) == 10);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#else
assert(dfsan_read_label(&i3, sizeof(i3)) == 8);
assert(dfsan_read_label(&ptr3, sizeof(ptr3)) == 2);
#endif
std::pair<int *, int> pair4 = copy_pair3(std::move(pair1));
int i4 = pair4.second;
int *ptr4 = pair4.first;
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#ifdef O0
assert(dfsan_read_label(&i4, sizeof(i4)) == 10);
assert(dfsan_read_label(&ptr4, sizeof(ptr4)) == 10);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#else
assert(dfsan_read_label(&i4, sizeof(i4)) == 8);
assert(dfsan_read_label(&ptr4, sizeof(ptr4)) == 2);
#endif
}
void test_branches() {
uint32_t res = 4;
dfsan_set_label(8, &res, sizeof(res));
char p[100];
const char *q = p;
dfsan_set_label(2, &q, sizeof(q));
{
std::fill_n(p, 100, static_cast<char>(128));
{
std::pair<const char *, uint32_t> r = return_ptr_and_i32(q, res);
assert(dfsan_read_label(&r.first, sizeof(r.first)) == 0);
assert(dfsan_read_label(&r.second, sizeof(r.second)) == 0);
}
{
std::pair<const char *, uint64_t> r = return_ptr_and_i64(q, res);
assert(dfsan_read_label(&r.first, sizeof(r.first)) == 0);
assert(dfsan_read_label(&r.second, sizeof(r.second)) == 0);
}
}
{
std::fill_n(p, 100, 0);
{
std::pair<const char *, uint32_t> r = return_ptr_and_i32(q, res);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#ifdef O0
assert(dfsan_read_label(&r.first, sizeof(r.first)) == 10);
assert(dfsan_read_label(&r.second, sizeof(r.second)) == 10);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#else
assert(dfsan_read_label(&r.first, sizeof(r.first)) == 2);
assert(dfsan_read_label(&r.second, sizeof(r.second)) == 8);
#endif
}
{
std::pair<const char *, uint64_t> r = return_ptr_and_i64(q, res);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#ifdef O0
assert(dfsan_read_label(&r.first, sizeof(r.first)) == 10);
assert(dfsan_read_label(&r.second, sizeof(r.second)) == 10);
[dfsan] Track field/index-level shadow values in variables ************* * The problem ************* See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current DFSan always uses a 16bit shadow value for a variable with any type by combining all shadow values of all bytes of the variable. So it cannot distinguish two fields of a struct: each field's shadow value equals the combined shadow value of all fields. This introduces an overtaint issue. Consider a parsing function std::pair<char*, int> get_token(char* p); where p points to a buffer to parse, the returned pair includes the next token and the pointer to the position in the buffer after the token. If the token is tainted, then both the returned pointer and int ar tainted. If the parser keeps on using get_token for the rest parsing, all the following outputs are tainted because of the tainted pointer. The CL is the first change to address the issue. ************************** * The proposed improvement ************************** Eventually all fields and indices have their own shadow values in variables and memory. For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8}, [2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16], {[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with primary type still have shadow values i16. *************************** * An potential implementation plan *************************** The idea is to adopt the change incrementially. 1) This CL Support field-level accuracy at variables/args/ret in TLS mode, load/store/alloca still use combined shadow values. After the alloca promotion and SSA construction phases (>=-O1), we assume alloca and memory operations are reduced. So if struct variables do not relate to memory, their tracking is accurate at field level. 2) Support field-level accuracy at alloca 3) Support field-level accuracy at load/store These two should make O0 and real memory access work. 4) Support vector if necessary. 5) Support Args mode if necessary. 6) Support passing more accurate shadow values via custom functions if necessary. *************** * About this CL. *************** The CL did the following 1) extended TLS arg/ret to work with aggregate types. This is similar to what MSan does. 2) implemented how to map between an original type/value/zero-const to its shadow type/value/zero-const. 3) extended (insert|extract)value to use field/index-level progagation. 4) for other instructions, propagation rules are combining inputs by or. The CL converts between aggragate and primary shadow values at the cases. 5) Custom function interfaces also need such a conversion because all existing custom functions use i16. It is unclear whether custome functions need more accurate shadow propagation yet. 6) Added test cases for aggregate type related cases. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D92261
2020-11-23 16:33:59 +08:00
#else
assert(dfsan_read_label(&r.first, sizeof(r.first)) == 2);
assert(dfsan_read_label(&r.second, sizeof(r.second)) == 8);
#endif
}
}
}
int main(void) {
test_simple_constructors();
test_branches();
return 0;
}