llvm-project

Commit Graph

Author	SHA1	Message	Date
JF Bastien	14daa20be1	Automatic variable initialization Summary: Add an option to initialize automatic variables with either a pattern or with zeroes. The default is still that automatic variables are uninitialized. Also add attributes to request uninitialized on a per-variable basis, mainly to disable initialization of large stack arrays when deemed too expensive. This isn't meant to change the semantics of C and C++. Rather, it's meant to be a last-resort when programmers inadvertently have some undefined behavior in their code. This patch aims to make undefined behavior hurt less, which security-minded people will be very happy about. Notably, this means that there's no inadvertent information leak when: - The compiler re-uses stack slots, and a value is used uninitialized. - The compiler re-uses a register, and a value is used uninitialized. - Stack structs / arrays / unions with padding are copied. This patch only addresses stack and register information leaks. There's many more infoleaks that we could address, and much more undefined behavior that could be tamed. Let's keep this patch focused, and I'm happy to address related issues elsewhere. To keep the patch simple, only some `undef` is removed for now, see `replaceUndef`. The padding-related infoleaks are therefore not all gone yet. This will be addressed in a follow-up, mainly because addressing padding-related leaks should be a stand-alone option which is implied by variable initialization. There are three options when it comes to automatic variable initialization: 0. Uninitialized This is C and C++'s default. It's not changing. Depending on code generation, a programmer who runs into undefined behavior by using an uninialized automatic variable may observe any previous value (including program secrets), or any value which the compiler saw fit to materialize on the stack or in a register (this could be to synthesize an immediate, to refer to code or data locations, to generate cookies, etc). 1. Pattern initialization This is the recommended initialization approach. Pattern initialization's goal is to initialize automatic variables with values which will likely transform logic bugs into crashes down the line, are easily recognizable in a crash dump, without being values which programmers can rely on for useful program semantics. At the same time, pattern initialization tries to generate code which will optimize well. You'll find the following details in `patternFor`: - Integers are initialized with repeated 0xAA bytes (infinite scream). - Vectors of integers are also initialized with infinite scream. - Pointers are initialized with infinite scream on 64-bit platforms because it's an unmappable pointer value on architectures I'm aware of. Pointers are initialize to 0x000000AA (small scream) on 32-bit platforms because 32-bit platforms don't consistently offer unmappable pages. When they do it's usually the zero page. As people try this out, I expect that we'll want to allow different platforms to customize this, let's do so later. - Vectors of pointers are initialized the same way pointers are. - Floating point values and vectors are initialized with a negative quiet NaN with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff). NaNs are nice (here, anways) because they propagate on arithmetic, making it more likely that entire computations become NaN when a single uninitialized value sneaks in. - Arrays are initialized to their homogeneous elements' initialization value, repeated. Stack-based Variable-Length Arrays (VLAs) are runtime-initialized to the allocated size (no effort is made for negative size, but zero-sized VLAs are untouched even if technically undefined). - Structs are initialized to their heterogeneous element's initialization values. Zero-size structs are initialized as 0xAA since they're allocated a single byte. - Unions are initialized using the initialization for the largest member of the union. Expect the values used for pattern initialization to change over time, as we refine heuristics (both for performance and security). The goal is truly to avoid injecting semantics into undefined behavior, and we should be comfortable changing these values when there's a worthwhile point in doing so. Why so much infinite scream? Repeated byte patterns tend to be easy to synthesize on most architectures, and otherwise memset is usually very efficient. For values which aren't entirely repeated byte patterns, LLVM will often generate code which does memset + a few stores. 2. Zero initialization Zero initialize all values. This has the unfortunate side-effect of providing semantics to otherwise undefined behavior, programs therefore might start to rely on this behavior, and that's sad. However, some programmers believe that pattern initialization is too expensive for them, and data might show that they're right. The only way to make these programmers wrong is to offer zero-initialization as an option, figure out where they are right, and optimize the compiler into submission. Until the compiler provides acceptable performance for all security-minded code, zero initialization is a useful (if blunt) tool. I've been asked for a fourth initialization option: user-provided byte value. This might be useful, and can easily be added later. Why is an out-of band initialization mecanism desired? We could instead use -Wuninitialized! Indeed we could, but then we're forcing the programmer to provide semantics for something which doesn't actually have any (it's uninitialized!). It's then unclear whether `int derp = 0;` lends meaning to `0`, or whether it's just there to shut that warning up. It's also way easier to use a compiler flag than it is to manually and intelligently initialize all values in a program. Why not just rely on static analysis? Because it cannot reason about all dynamic code paths effectively, and it has false positives. It's a great tool, could get even better, but it's simply incapable of catching all uses of uninitialized values. Why not just rely on memory sanitizer? Because it's not universally available, has a 3x performance cost, and shouldn't be deployed in production. Again, it's a great tool, it'll find the dynamic uses of uninitialized variables that your test coverage hits, but it won't find the ones that you encounter in production. What's the performance like? Not too bad! Previous publications [0] have cited 2.7 to 4.5% averages. We've commmitted a few patches over the last few months to address specific regressions, both in code size and performance. In all cases, the optimizations are generally useful, but variable initialization benefits from them a lot more than regular code does. We've got a handful of other optimizations in mind, but the code is in good enough shape and has found enough latent issues that it's a good time to get the change reviewed, checked in, and have others kick the tires. We'll continue reducing overheads as we try this out on diverse codebases. Is it a good idea? Security-minded folks think so, and apparently so does the Microsoft Visual Studio team [1] who say "Between 2017 and mid 2018, this feature would have killed 49 MSRC cases that involved uninitialized struct data leaking across a trust boundary. It would have also mitigated a number of bugs involving uninitialized struct data being used directly.". They seem to use pure zero initialization, and claim to have taken the overheads down to within noise. Don't just trust Microsoft though, here's another relevant person asking for this [2]. It's been proposed for GCC [3] and LLVM [4] before. What are the caveats? A few! - Variables declared in unreachable code, and used later, aren't initialized. This goto, Duff's device, other objectionable uses of switch. This should instead be a hard-error in any serious codebase. - Volatile stack variables are still weird. That's pre-existing, it's really the language's fault and this patch keeps it weird. We should deprecate volatile [5]. - As noted above, padding isn't fully handled yet. I don't think these caveats make the patch untenable because they can be addressed separately. Should this be on by default? Maybe, in some circumstances. It's a conversation we can have when we've tried it out sufficiently, and we're confident that we've eliminated enough of the overheads that most codebases would want to opt-in. Let's keep our precious undefined behavior until that point in time. How do I use it: 1. On the command-line: -ftrivial-auto-var-init=uninitialized (the default) -ftrivial-auto-var-init=pattern -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang 2. Using an attribute: int dont_initialize_me __attribute((uninitialized)); [0]: https://users.elis.ugent.be/~jsartor/researchDocs/OOPSLA2011Zero-submit.pdf [1]: https://twitter.com/JosephBialek/status/1062774315098112001 [2]: https://outflux.net/slides/2018/lss/danger.pdf [3]: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00615.html [4]: `776a0955ef` [5]: http://wg21.link/p1152 I've also posted an RFC to cfe-dev: http://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html <rdar://problem/39131435> Reviewers: pcc, kcc, rsmith Subscribers: JDevlieghere, jkorous, dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D54604 llvm-svn: 349442	2018-12-18 05:12:21 +00:00
Alexander Kornienko	2a8c18d991	Fix typos in clang Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399	2018-04-06 15:14:32 +00:00
Ted Kremenek	7979ccf35a	Teach -Wuninitialized to recognize __attribute__((analyzer_noreturn)) for halting the propagation of uninitialized value tracking along a path. Unlike __attribute__((noreturn)), this attribute (which is used by clients of the static analyzer) can be used to annotate functions that essentially never return, but in rare cares may be allowed to return for (special) debugging purposes. This attribute has been shown in reducing false positives in the static analyzer by pruning false postives, and is equally applicable here. Handling this attribute in the CFG itself is another option, but this is not something all clients (e.g., possibly -Wunreachable-code) would want to see. Addresses <rdar://problem/12281583>. llvm-svn: 163681	2012-09-12 05:53:43 +00:00
Richard Smith	f676e45e5f	When a && or \|\| appears as the condition of a ?:, perform appropriate short-circuiting when building the CFG. Also be sure to skip parens before checking for the && / \|\| special cases. Finally, fix some crashes in CFG printing in the presence of calls to destructors for array of array of class type. llvm-svn: 160691	2012-07-24 21:02:14 +00:00
Richard Smith	b21dd02e61	Uninitialized variables: two little changes: * Treat compound assignment as a use, at Jordy's request. * Always add compound assignments into the CFG, so we can correctly diagnose the use in 'return x += 1;' llvm-svn: 160334	2012-07-17 01:27:33 +00:00
Richard Smith	6376d1fd9c	-Wuninitialized: Split the classification of DeclRefExprs as initialization or use out of TransferFunctions, and compute it in advance rather than on-the-fly. This allows us to handle compound assignments with DeclRefExprs on the RHS correctly, and also makes it trivial to treat const& function parameters as not initializing the argument. The patch also makes both of those changes. llvm-svn: 160330	2012-07-17 00:06:14 +00:00
Ted Kremenek	b50e716bac	Refine CFG so that '&&' and '\|\|' don't lead to extra confluence points when used in a branch, but instead push the terminator for the branch down into the basic blocks of the subexpressions of '&&' and '\|\|' respectively. This eliminates some artifical control-flow from the CFG and results in a more compact CFG. Note that this patch only alters the branches 'while', 'if' and 'for'. This was complex enough for one patch. The remaining branches (e.g., do...while) can be handled in a separate patch, but they weren't immediately tackled because they were less important. It is possible that this patch introduces some subtle bugs, particularly w.r.t. to destructor placement. I've tried to audit these changes, but it is also known that the destructor logic needs some refinement in the area of '\|\|' and '&&' regardless (i.e., their are known bugs). llvm-svn: 160218	2012-07-14 05:04:10 +00:00
Richard Smith	b721e301df	-Wuninitialized: assume that an __attribute__((returns_twice)) function might initialize any variable. This is extremely conservative, but is sufficient for now. llvm-svn: 159620	2012-07-02 23:23:04 +00:00
Joerg Sonnenberger	5c98e1fb24	Don't warn about address-to-member used as part of initialisation, if the member expression is in parentheses. llvm-svn: 158651	2012-06-17 23:10:39 +00:00
Richard Smith	a8d4f229a6	-Wuninitialized bugfix: when entering the scope of a variable with no initializer, it is uninitialized, even if we may be coming from somewhere where it was initialized. llvm-svn: 158611	2012-06-16 23:34:14 +00:00
Richard Smith	1bb8edb8ac	In response to some discussions on IRC, tweak the wording of the new -Wsometimes-uninitialized diagnostics to make it clearer that the cause of the issue may be a condition which must always evaluate to true or false, rather than an uninitialized variable. To emphasize this, add a new note with a fixit which removes the impossible condition or replaces it with a constant. Also, downgrade the diagnostic from -Wsometimes-uninitialized to -Wconditional-uninitialized when it applies to a range-based for loop, since the condition is not written explicitly in the code in that case. llvm-svn: 157511	2012-05-26 06:20:46 +00:00
Richard Smith	4323bf8e2e	Split a chunk of -Wconditional-uninitialized warnings out into a separate flag, -Wsometimes-uninitialized. This detects cases where an explicitly-written branch inevitably leads to an uninitialized variable use (so either the branch is dead code or there is an uninitialized use bug). This chunk of warnings tentatively lives within -Wuninitialized, in order to give it more visibility to existing Clang users. llvm-svn: 157458	2012-05-25 02:17:09 +00:00
Richard Trieu	2cdcf82396	Fix a note without a SourceLocation. #define TEST int y; int x = y; void foo() { TEST } -Wuninitialized gives this warning: invalid-loc.cc:4:3: warning: variable 'y' is uninitialized when used here [-Wuninitialized] TEST ^~~~ invalid-loc.cc:2:29: note: expanded from macro 'TEST' #define TEST int y; int x = y; ^ note: initialize the variable 'y' to silence this warning 1 warning generated. The second note lacks filename, line number, and code snippet. This change will remove the fixit and only point to variable declaration. invalid-loc.cc:4:3: warning: variable 'y' is uninitialized when used here [-Wuninitialized] TEST ^~~~ invalid-loc.cc:2:29: note: expanded from macro 'TEST' #define TEST int y; int x = y; ^ invalid-loc.cc:4:3: note: variable 'y' is declared here TEST ^ invalid-loc.cc:2:14: note: expanded from macro 'TEST' #define TEST int y; int x = y; ^ 1 warning generated. llvm-svn: 156045	2012-05-03 01:09:59 +00:00
Matt Beaumont-Gay	4b489fa629	Only warn at self-initialization if some later use is always uninitialized. llvm-svn: 142538	2011-10-19 18:53:03 +00:00
Ted Kremenek	596fa16dd3	Tweak -Wuninitialized's handling of 'int x = x' to report that as the root cause of an uninitialized variable IFF there are other uses of that uninitialized variable. Fixes <rdar://problem/9259237>. llvm-svn: 141881	2011-10-13 18:50:06 +00:00
Ted Kremenek	171969c8c2	r141345 also fixed a -Wuninitialized bug where loop conditions were not always flagged as being uninitialized. Addresses <rdar://problem/9432305>. llvm-svn: 141346	2011-10-07 00:52:56 +00:00
Ted Kremenek	f8fd4d4962	Fix infinite loop in -Wuninitialized reported in PR 11069. llvm-svn: 141345	2011-10-07 00:42:48 +00:00
David Blaikie	e5f9a9e603	Show either a location or a fixit note, not both, for uninitialized variable warnings. llvm-svn: 139463	2011-09-10 05:35:08 +00:00
Ted Kremenek	aed4677a1c	-Wuninitialized: fix insidious bug resulting from interplay of blocks and dead code. Fixes <rdar://problem/10060250>. llvm-svn: 139027	2011-09-02 19:39:26 +00:00
Ted Kremenek	ee9848e20d	Fix regression in -Wuninitialized involving VLAs. It turns out that we were modeling sizeof(VLAs) incorrectly in the CFG, and also the static analyzer. This patch regresses the analyzer a bit, but that needs to be followed up with a better solution. Fixes <rdar://problem/10008112>. llvm-svn: 138372	2011-08-23 20:30:50 +00:00
Chandler Carruth	4dd6c043ae	Move duplicate uninitialized warning suppression into the AnalysisBasedWarnings Sema layer and out of the Analysis library itself. This returns the uninitialized values analysis to a more pure form, allowing its original logic to correctly detect some categories of definitely uninitialized values. Fixes PR10358 (again). Thanks to Ted for reviewing and updating this patch after his rewrite of several portions of this analysis. llvm-svn: 135748	2011-07-22 05:27:52 +00:00
Ted Kremenek	65b3e0649c	Fix false negative in -Wuninitialized involving a () wrapping an lvalue-to-rvalue conversion in a DeclStmt. llvm-svn: 135525	2011-07-19 21:41:51 +00:00
Ted Kremenek	5d855bf7f2	Fix assertion failure in UninitializedValues.cpp where an lvalue to rvalue conversion is wrapped in a parenthesis. llvm-svn: 135519	2011-07-19 20:33:49 +00:00
Chandler Carruth	7cf5a37605	Revert r135217, which wasn't the correct fix for PR10358. With this patch, we actually move the state-machine for the value set backwards one step. This can pretty easily lead to infinite loops where we continually try to propagate a bit, succeed for one iteration, but then back up because we find an uninitialized use. A reduced test case from PR10379 is included. llvm-svn: 135359	2011-07-16 22:27:02 +00:00
Ted Kremenek	f0b28d7fe5	Fix false negative reported in PR 10358 by using 'Unknown' in -Wuninitialized to avoid cascading warnings. Patch by Kaelyn Uhrain. llvm-svn: 135217	2011-07-14 23:43:06 +00:00
Ted Kremenek	efdb7fe53b	Fix crash in -Wuninitialized when using switch statments whose condition is a logical operation. llvm-svn: 131158	2011-05-10 22:10:35 +00:00
Chandler Carruth	42983aef34	Switch 'is possibly uninitialized' to 'may be uninitialized' based on Chris's feedback. llvm-svn: 129127	2011-04-08 06:47:15 +00:00
Chandler Carruth	278f89732f	Now that the analyzer is distinguishing between uninitialized uses that definitely have a path leading to them, and possibly have a path leading to them; reflect that distinction in the warning text emitted. llvm-svn: 129126	2011-04-08 06:33:38 +00:00
Chandler Carruth	78c7e34485	Commit a bit of a hack to fully handle the situation where variables are marked explicitly as uninitialized through direct self initialization: int x = x; With r128894 we prevented warnings about this code, and this patch teaches the analysis engine to continue analyzing subsequent uses of 'x'. This should wrap up PR9624. There is still an open question of whether we should suppress the maybe-uninitialized warnings resulting from variables initialized in this fashion. The definitely-uninitialized uses should always be warned. llvm-svn: 128932	2011-04-05 21:36:30 +00:00
Chandler Carruth	b5d4831f83	Fix PR9624 by explicitly disabling uninitialized warnings for direct self-init: int x = x; GCC disables its warnings on this construct as a way of indicating that the programmer intentionally wants the variable to be uninitialized. Only the warning on the initializer is turned off in this iteration. This makes the code a lot more ugly, but starts commenting the surprising behavior here. This is a WIP, I want to refactor it substantially for clarity, and to determine whether subsequent warnings should be suppressed or not. llvm-svn: 128894	2011-04-05 17:41:31 +00:00
Ted Kremenek	378819342e	Fix PR 9626 (duplicated self-init warnings under -Wuninitialized) with numerous CFG and UninitializedValues analysis changes: 1) Change the CFG to include the DeclStmt for conditional variables, instead of using the condition itself as a faux DeclStmt. 2) Update ExprEngine (the static analyzer) to understand (1), so not to regress. 3) Update UninitializedValues.cpp to initialize all tracked variables to Uninitialized at the start of the function/method. 4) Only use the SelfReferenceChecker (SemaDecl.cpp) on global variables, leaving the dataflow analysis to handle other cases. The combination of (1) and (3) allows the dataflow-based -Wuninitialized to find self-init problems when the initializer contained control-flow. llvm-svn: 128858	2011-04-04 23:29:12 +00:00
Ted Kremenek	b8d8c4ec56	-Wuninitialized: use "self-init" warning when issue uninitialized values warnings from the dataflow analysis that include within the initializer of a variable. llvm-svn: 128843	2011-04-04 20:56:00 +00:00
Ted Kremenek	35d800c39f	-Wuninitialized: don't issue fixit for initializer if a variable declaration already has an initializer. llvm-svn: 128838	2011-04-04 19:43:57 +00:00
Ted Kremenek	77361761fb	-Wuninitialized should not warn about variables captured by blocks as byref. Note this can potentially be enhanced to detect if the __block variable is actually written by the block, or only when the block "escapes" or is actually used, but that requires more analysis than it is probably worth for this simple check. llvm-svn: 128681	2011-03-31 22:32:41 +00:00
Ted Kremenek	61c74a1423	Rename -Wuninitialized-maybe to -Wconditional-uninitialized. llvm-svn: 127793	2011-03-17 03:06:07 +00:00
Ted Kremenek	ea6c20adaf	Take 2: merge -Wuninitialized-experimental into -Wuninitialized. Only must-be-uninitialized warnings are reported, with maybe-uninitialized under a separate flag. I await any fallout/comments/feedback, although hopefully this will produce no noise for users. llvm-svn: 127670	2011-03-15 05:22:33 +00:00
Ted Kremenek	c8c4e5f371	Split warnings from -Wuninitialized-experimental into "must-be-initialized" and "may-be-initialized" warnings, each controlled by different flags. llvm-svn: 127666	2011-03-15 04:57:38 +00:00
Ted Kremenek	792798549f	Remove old UninitializedValues analysis. llvm-svn: 127656	2011-03-15 03:17:01 +00:00
Ted Kremenek	e6a12a97d4	Move uninitialized variable checking back under -Wuninitialized-experimental. It is clear from user feedback that this warning is not quite ready. llvm-svn: 125007	2011-02-07 17:38:38 +00:00
Ted Kremenek	436cc8ffe7	Reenable -Wuninitialized warning for captured block variables. llvm-svn: 124782	2011-02-03 06:51:50 +00:00
Ted Kremenek	b3dbe28e31	Based on user feedback, swap -Wuninitialized diagnostics to have the warning refer to the bad use, and the note to the variable declaration. llvm-svn: 124758	2011-02-02 23:35:53 +00:00
Ted Kremenek	ba357296e7	Enhance -Wuninitialized to better reason about \|\| and &&, tracking dual dataflow facts and properly merging them. Fixes PR 9076. llvm-svn: 124666	2011-02-01 17:43:18 +00:00
Ted Kremenek	1be4a59a11	Teach -Wuninitialized about indirect goto. Fixes PR 9071. llvm-svn: 124394	2011-01-27 18:51:39 +00:00
Ted Kremenek	93a313869f	Teach -Wuninitialized not to assert when analyzing blocks that reference captured variables. llvm-svn: 124348	2011-01-27 02:29:34 +00:00
Ted Kremenek	e543be3531	Merge -Wuninitialized-experimental into -Wuninitialized. llvm-svn: 124279	2011-01-26 04:49:48 +00:00
Ted Kremenek	33ddd9692d	Tweak -Wuninitialized-experimental to not emit a warning for uses of an uninitialized variable when the use is a void cast, e.g. (void) x. llvm-svn: 124278	2011-01-26 04:49:43 +00:00
Ted Kremenek	bcf848f70a	Teach -Wuninitialized-experimental to also warn about uninitialized variables captured by blocks. llvm-svn: 124213	2011-01-25 19:13:48 +00:00
Ted Kremenek	8f01420d9d	Teach -Wuninitialized-experimental about sizeof(). llvm-svn: 124076	2011-01-23 17:53:04 +00:00
Ted Kremenek	33d4b5eb66	Provide -Wuninitialized-experimental fixits for floats, and also check if 'nil' is declared when suggesting it for initializing ObjC pointers. llvm-svn: 124004	2011-01-21 22:49:49 +00:00
Ted Kremenek	2959fdd087	Add basic fixits for -Wuninitialized-experimental to suggest initializations for pointer and ObjC pointer types. llvm-svn: 123995	2011-01-21 19:41:46 +00:00

1 2

56 Commits