llvm-project/llvm/lib/Transforms/Instrumentation
Jianzhou Zhao ea981165a4 [dfsan] Track field/index-level shadow values in variables
*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.

Consider a parsing function

   std::pair<char*, int> get_token(char* p);

where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.

If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.

The CL is the first change to address the issue.

**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.

For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.

***************************
* An potential implementation plan
***************************

The idea is to adopt the change incrementially.

1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.

After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.

2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store

These two should make O0 and real memory access work.

4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.

***************
* About this CL.
***************
The CL did the following

1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.

2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.

3) extended (insert|extract)value to use field/index-level progagation.

4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.

5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.

6) Added test cases for aggregate type related cases.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D92261
2020-12-09 19:38:35 +00:00
..
AddressSanitizer.cpp static const char *const foo => const char foo[] 2020-12-01 10:33:18 -08:00
BoundsChecking.cpp [local-bounds] Ignore volatile operations 2020-05-05 23:08:08 -07:00
CFGMST.h [PGO] Supporting code for always instrumenting entry block 2020-07-22 15:01:53 -07:00
CGProfile.cpp [CGProfile] don't emit cgprofile entry if called function is dllimport 2020-09-23 16:56:54 -07:00
CMakeLists.txt llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
ControlHeightReduction.cpp [CHR] Use pred_size (NFC) 2020-11-24 22:52:30 -08:00
DataFlowSanitizer.cpp [dfsan] Track field/index-level shadow values in variables 2020-12-09 19:38:35 +00:00
GCOVProfiling.cpp [NFC][GCOV] Fix build: there's `llvm::stable_partition()` wrapper 2020-10-05 22:52:32 +03:00
HWAddressSanitizer.cpp static const char *const foo => const char foo[] 2020-12-01 10:33:18 -08:00
IndirectCallPromotion.cpp [ICP] Don't promote when target not defined in module 2020-12-08 07:45:36 -08:00
InstrOrderFile.cpp [CallSite removal] Remove unneeded includes of CallSite.h. NFC 2020-04-22 00:07:13 -07:00
InstrProfiling.cpp [PGO] Remove the old memop value profiling buckets. 2020-10-15 10:09:49 -07:00
Instrumentation.cpp [MemProf] Rename HeapProfiler to MemProfiler for consistency 2020-09-14 13:14:57 -07:00
MaximumSpanningTree.h
MemProfiler.cpp [MemProf] Make __memprof_shadow_memory_dynamic_address dso_local in static relocation model 2020-12-05 21:36:31 -08:00
MemorySanitizer.cpp [msan] Replace 8 by kShadowTLSAlignment 2020-12-02 01:09:49 +00:00
PGOInstrumentation.cpp Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM" 2020-11-14 13:12:38 +03:00
PGOMemOPSizeOpt.cpp [NFC] Reduce include files dependency. 2020-12-03 18:25:05 +03:00
PoisonChecking.cpp [ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions 2020-09-09 20:00:26 +09:00
SanitizerCoverage.cpp static const char *const foo => const char foo[] 2020-12-01 10:33:18 -08:00
ThreadSanitizer.cpp static const char *const foo => const char foo[] 2020-12-01 10:33:18 -08:00
ValueProfileCollector.cpp ValueProfileCollector.h - remove unnecessary includes. NFC. 2020-07-23 12:33:13 +01:00
ValueProfileCollector.h ValueProfileCollector.h - remove unnecessary includes. NFC. 2020-07-23 12:33:13 +01:00
ValueProfilePlugins.inc [PGO] Guard the memcmp/bcmp size value profiling instrumentation behind flag. 2020-05-28 10:07:04 -07:00