Summary: Produce SymbolCast for integral types in `evalCast` function. Apply several simplification techniques while producing the symbols. Added a boolean option `handle-integral-cast-for-ranges` under `-analyzer-config` flag. Disabled the feature by default.
Differential Revision: https://reviews.llvm.org/D105340
This patch emits an error on AIX and z/OS because XCOFF and GOFF does not currently implement builtin function `CFStringMakeConstantString`. Tests that use this builtin were also disabled.
Reviewed By: SeanP
Differential Revision: https://reviews.llvm.org/D117315
This error was found when analyzing MySQL with CTU enabled.
When there are space characters in the lookup name, the current
delimiter searching strategy will make the file path wrongly parsed.
And when two lookup names have the same prefix before their first space
characters, a 'multiple definitions' error will be wrongly reported.
e.g. The lookup names for the two lambda exprs in the test case are
`c:@S@G@F@G#@Sa@F@operator int (*)(char)#1` and
`c:@S@G@F@G#@Sa@F@operator bool (*)(char)#1` respectively. And their
prefixes are both `c:@S@G@F@G#@Sa@F@operator` when using the first space
character as the delimiter.
Solving the problem by adding a length for the lookup name, making the
index items in the format of `USR-Length:USR File-Path`.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D102669
This expands checking for more expressions. This will check underflow
and loss of precision when using call expressions like:
void foo(unsigned);
int i = -1;
foo(i);
This also includes other expressions as well, so it can catch negative
indices to std::vector since it uses unsigned integers for [] and .at()
function.
Patch by: @pfultz2
Differential Revision: https://reviews.llvm.org/D46081
A series of unary operators and casts may obscure the variable we're
trying to analyze. Ignore them for the uninitialized value analysis.
Other checks determine if the unary operators result in a valid l-value.
Link: https://github.com/ClangBuiltLinux/linux/issues/1521
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D114848
Previously, the `SValBuilder` could not encounter expressions of the
following kind:
NonLoc OP Loc
Loc OP NonLoc
Where the `Op` is other than `BO_Add`.
As of now, due to the smarter simplification and the fixedpoint
iteration, it turns out we can.
It can happen if the `Loc` was perfectly constrained to a concrete
value (`nonloc::ConcreteInt`), thus the simplifier can do
constant-folding in these cases as well.
Unfortunately, this could cause assertion failures, since we assumed
that the operator must be `BO_Add`, causing a crash.
---
In the patch, I decided to preserve the original behavior (aka. swap the
operands (if the operator is commutative), but if the `RHS` was a
`loc::ConcreteInt` call `evalBinOpNN()`.
I think this interpretation of the arithmetic expression is closer to
reality.
I also tried naively introducing a separate handler for
`loc::ConcreteInt` RHS, before doing handling the more generic `Loc` RHS
case. However, it broke the `zoo1backwards()` test in the `nullptr.cpp`
file. This highlighted for me the importance to preserve the original
behavior for the `BO_Add` at least.
PS: Sorry for introducing yet another branch into this `evalBinOpXX`
madness. I've got a couple of ideas about refactoring these.
We'll see if I can get to it.
The test file demonstrates the issue and makes sure nothing similar
happens. The `no-crash` annotated lines show, where we crashed before
applying this patch.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D115149
Some projects [1,2,3] have flex-generated files besides bison-generated
ones.
Unfortunately, the comment `"/* A lexical scanner generated by flex */"`
generated by the tools is not necessarily at the beginning of the file,
thus we need to quickly skim through the file for this needle string.
Luckily, StringRef can do this operation in an efficient way.
That being said, now the bison comment is not required to be at the very
beginning of the file. This allows us to detect a couple more cases
[4,5,6].
Alternatively, we could say that we only allow whitespace characters
before matching the bison/flex header comment. That would prevent the
(probably) unnecessary string search in the buffer. However, I could not
verify that these tools would actually respect this assumption.
Additionally to this, e.g. the Twin project [1] has other non-whitespace
characters (some preprocessor directives) before the flex-generated
header comment. So the heuristic in the previous paragraph won't work
with that.
Thus, I would advocate the current implementation.
According to my measurement, this patch won't introduce measurable
performance degradation, even though we will do 2 linear scans.
I introduce the ignore-bison-generated-files and
ignore-flex-generated-files to disable skipping these files.
Both of these options are true by default.
[1]: https://github.com/cosmos72/twin/blob/master/server/rcparse_lex.cpp#L7
[2]: 22362cdcf9/sandbox/count-words/lexer.c (L6)
[3]: 11abdf6462/lab1/lex.yy.c (L6)
[4]: 47f5b2cfe2/B_yacc/1/y1.tab.h (L2)
[5]: 71d1bf9b1e/src/VBox/Additions/x11/x11include/xorg-server-1.8.0/parser.h (L2)
[6]: 3f773ceb13/Framework/OpenEars.framework/Versions/A/Headers/jsgf_parser.h (L2)
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D114510
Need to do the analysis of the captured expressions in the clauses.
Previously the compiler ignored them and it may lead to a compiler crash
trying to get the address of the mapped variables.
Differential Revision: https://reviews.llvm.org/D114546
This reverts commit f02c5f3478 and
addresses the issue mentioned in D114619 differently.
Repeating the issue here:
Currently, during symbol simplification we remove the original member
symbol from the equivalence class (`ClassMembers` trait). However, we
keep the reverse link (`ClassMap` trait), in order to be able the query
the related constraints even for the old member. This asymmetry can lead
to a problem when we merge equivalence classes:
```
ClassA: [a, b] // ClassMembers trait,
a->a, b->a // ClassMap trait, a is the representative symbol
```
Now let,s delete `a`:
```
ClassA: [b]
a->a, b->a
```
Let's merge ClassA into the trivial class `c`:
```
ClassA: [c, b]
c->c, b->c, a->a
```
Now, after the merge operation, `c` and `a` are actually in different
equivalence classes, which is inconsistent.
This issue manifests in a test case (added in D103317):
```
void recurring_symbol(int b) {
if (b * b != b)
if ((b * b) * b * b != (b * b) * b)
if (b * b == 1)
}
```
Before the simplification we have these equivalence classes:
```
trivial EQ1: [b * b != b]
trivial EQ2: [(b * b) * b * b != (b * b) * b]
```
During the simplification with `b * b == 1`, EQ1 is merged with `1 != b`
`EQ1: [b * b != b, 1 != b]` and we remove the complex symbol, so
`EQ1: [1 != b]`
Then we start to simplify the only symbol in EQ2:
`(b * b) * b * b != (b * b) * b --> 1 * b * b != 1 * b --> b * b != b`
But `b * b != b` is such a symbol that had been removed previously from
EQ1, thus we reach the above mentioned inconsistency.
This patch addresses the issue by making it impossible to synthesise a
symbol that had been simplified before. We achieve this by simplifying
the given symbol to the absolute simplest form.
Differential Revision: https://reviews.llvm.org/D114887
Add the capability to simplify more complex constraints where there are 3
symbols in the tree. In this change I extend simplifySVal to query constraints
of children sub-symbols in a symbol tree. (The constraint for the parent is
asked in getKnownValue.)
Differential Revision: https://reviews.llvm.org/D103317
Currently, during symbol simplification we remove the original member symbol
from the equivalence class (`ClassMembers` trait). However, we keep the
reverse link (`ClassMap` trait), in order to be able the query the
related constraints even for the old member. This asymmetry can lead to
a problem when we merge equivalence classes:
```
ClassA: [a, b] // ClassMembers trait,
a->a, b->a // ClassMap trait, a is the representative symbol
```
Now lets delete `a`:
```
ClassA: [b]
a->a, b->a
```
Let's merge the trivial class `c` into ClassA:
```
ClassA: [c, b]
c->c, b->c, a->a
```
Now after the merge operation, `c` and `a` are actually in different
equivalence classes, which is inconsistent.
One solution to this problem is to simply avoid removing the original
member and this is what this patch does.
Other options I have considered:
1) Always merge the trivial class into the non-trivial class. This might
work most of the time, however, will fail if we have to merge two
non-trivial classes (in that case we no longer can track equivalences
precisely).
2) In `removeMember`, update the reverse link as well. This would cease
the inconsistency, but we'd loose precision since we could not query
the constraints for the removed member.
Differential Revision: https://reviews.llvm.org/D114619
Make the SValBuilder capable to simplify existing
SVals based on a newly added constraints when evaluating a BinOp.
Before this patch, we called `simplify` only in some edge cases.
However, we can and should investigate the constraints in all cases.
Differential Revision: https://reviews.llvm.org/D113753
Make the SimpleSValBuilder capable to simplify existing IntSym
expressions based on a newly added constraint on the sub-expression.
Differential Revision: https://reviews.llvm.org/D113754
[NFC] This patch replaces `masterPort` with `mainPort` in these
testcases.
Reviewed By: ZarkoCA
Differential Revision: https://reviews.llvm.org/D113505
Summary: Specifically, this fixes the case when we get an access to array element through the pointer to element. This covers several FIXME's. in https://reviews.llvm.org/D111654.
Example:
const int arr[4][2];
const int *ptr = arr[1]; // Fixes this.
The issue is that `arr[1]` is `int*` (&Element{Element{glob_arr5,1 S64b,int[2]},0 S64b,int}), and `ptr` is `const int*`. We don't take qualifiers into account. Consequently, we doesn't match the types as the same ones.
Differential Revision: https://reviews.llvm.org/D113480
D103314 introduced symbol simplification when a new constant constraint is
added. Currently, we simplify existing equivalence classes by iterating over
all existing members of them and trying to simplify each member symbol with
simplifySVal.
At the end of such a simplification round we may end up introducing a
new constant constraint. Example:
```
if (a + b + c != d)
return;
if (c + b != 0)
return;
// Simplification starts here.
if (b != 0)
return;
```
The `c == 0` constraint is the result of the first simplification iteration.
However, we could do another round of simplification to reach the conclusion
that `a == d`. Generally, we could do as many new iterations until we reach a
fixpoint.
We can reach to a fixpoint by recursively calling `State->assume` on the
newly simplified symbol. By calling `State->assume` we re-ignite the
whole assume machinery (along e.g with adjustment handling).
Why should we do this? By reaching a fixpoint in simplification we are capable
of discovering infeasible states at the moment of the introduction of the
**first** constant constraint.
Let's modify the previous example just a bit, and consider what happens without
the fixpoint iteration.
```
if (a + b + c != d)
return;
if (c + b != 0)
return;
// Adding a new constraint.
if (a == d)
return;
// This brings in a contradiction.
if (b != 0)
return;
clang_analyzer_warnIfReached(); // This produces a warning.
// The path is already infeasible...
if (c == 0) // ...but we realize that only when we evaluate `c == 0`.
return;
```
What happens currently, without the fixpoint iteration? As the inline comments
suggest, without the fixpoint iteration we are doomed to realize that we are on
an infeasible path only after we are already walking on that. With fixpoint
iteration we can detect that before stepping on that. With fixpoint iteration,
the `clang_analyzer_warnIfReached` does not warn in the above example b/c
during the evaluation of `b == 0` we realize the contradiction. The engine and
the checkers do rely on that either `assume(Cond)` or `assume(!Cond)` should be
feasible. This is in fact assured by the so called expensive checks
(LLVM_ENABLE_EXPENSIVE_CHECKS). The StdLibraryFuncionsChecker is notably one of
the checkers that has a very similar assertion.
Before this patch, we simply added the simplified symbol to the equivalence
class. In this patch, after we have added the simplified symbol, we remove the
old (more complex) symbol from the members of the equivalence class
(`ClassMembers`). Removing the old symbol is beneficial because during the next
iteration of the simplification we don't have to consider again the old symbol.
Contrary to how we handle `ClassMembers`, we don't remove the old Sym->Class
relation from the `ClassMap`. This is important for two reasons: The
constraints of the old symbol can still be found via it's equivalence class
that it used to be the member of (1). We can spare one removal and thus one
additional tree in the forest of `ClassMap` (2).
Performance and complexity: Let us assume that in a State we have N non-trivial
equivalence classes and that all constraints and disequality info is related to
non-trivial classes. In the worst case, we can simplify only one symbol of one
class in each iteration. The number of symbols in one class cannot grow b/c we
replace the old symbol with the simplified one. Also, the number of the
equivalence classes can decrease only, b/c the algorithm does a merge operation
optionally. We need N iterations in this case to reach the fixpoint. Thus, the
steps needed to be done in the worst case is proportional to `N*N`. Empirical
results (attached) show that there is some hardly noticeable run-time and peak
memory discrepancy compared to the baseline. In my opinion, these differences
could be the result of measurement error.
This worst case scenario can be extended to that cases when we have trivial
classes in the constraints and in the disequality map are transforming to such
a State where there are only non-trivial classes, b/c the algorithm does merge
operations. A merge operation on two trivial classes results in one non-trivial
class.
Differential Revision: https://reviews.llvm.org/D106823
Summary: Add support of multi-dimensional arrays in `RegionStoreManager::getBindingForElement`. Handle nested ElementRegion's getting offsets and checking for being in bounds. Get values from the nested initialization lists using obtained offsets.
Differential Revision: https://reviews.llvm.org/D111654
Now in libcxx and clang, all the coroutine components are defined in
std::experimental namespace.
And now the coroutine TS is merged into C++20. So in the working draft
like N4892, we could find the coroutine components is defined in std
namespace instead of std::experimental namespace.
And the coroutine support in clang seems to be relatively stable. So I
think it may be suitable to move the coroutine component into the
experiment namespace now.
This patch would make clang lookup coroutine_traits in std namespace
first. For the compatibility consideration, clang would lookup in
std::experimental namespace if it can't find definitions in std
namespace. So the existing codes wouldn't be break after update
compiler.
And in case the compiler found std::coroutine_traits and
std::experimental::coroutine_traits at the same time, it would emit an
error for it.
The support for looking up std::experimental::coroutine_traits would be
removed in Clang16.
Reviewed By: lxfind, Quuxplusone
Differential Revision: https://reviews.llvm.org/D108696
Replace variable and functions names, as well as comments that contain whitelist with
more inclusive terms.
Reviewed By: aaron.ballman, martong
Differential Revision: https://reviews.llvm.org/D112642
Summary: Assuming that values of constant arrays never change, we can retrieve values for specific position(index) right from the initializer, if presented. Retrieve a character code by index from StringLiteral which is an initializer of constant arrays in global scope.
This patch has a known issue of getting access to characters past the end of the literal. The declaration, in which the literal is used, is an implicit cast of kind `array-to-pointer`. The offset should be in literal length's bounds. This should be distinguished from the states in the Standard C++20 [dcl.init.string] 9.4.2.3. Example:
const char arr[42] = "123";
char c = arr[41]; // OK
const char * const str = "123";
char c = str[41]; // NOK
Differential Revision: https://reviews.llvm.org/D107339
Due to a typo, `sprintf()` was recognized as a taint source instead of a
taint propagator. It was because an empty taint source list - which is
the first parameter of the `TaintPropagationRule` - encoded the
unconditional taint sources.
This typo effectively turned the `sprintf()` into an unconditional taint
source.
This patch fixes that typo and demonstrated the correct behavior with
tests.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D112558
We can reuse the "adjustment" handling logic in the higher level
of the solver by calling `State->assume`.
Differential Revision: https://reviews.llvm.org/D112296
Initiate the reorganization of the equality information during symbol
simplification. E.g., if we bump into `c + 1 == 0` during simplification
then we'd like to express that `c == -1`. It makes sense to do this only
with `SymIntExpr`s.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D111642
It seems like protobuf crashed the `std::string` checker.
Somehow it acquired `UnknownVal` as the sole `std::string` constructor
parameter, causing a crash in the `castAs<Loc>()`.
This patch addresses this.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D112551
Summary: Fix a case when the extent can not be retrieved correctly from incomplete array declaration. Use redeclaration to get the array extent.
Differential Revision: https://reviews.llvm.org/D111542
Summary:
1. Improve readability by moving deeply nested block of code from RegionStoreManager::getBindingForElement to new separate functions:
- getConstantValFromConstArrayInitializer;
- getSValFromInitListExpr.
2. Handle the case when index is a symbolic value. Write specific test cases.
3. Add test cases when there is no initialization expression presented.
This patch implies to make next patches clearer and easier for review process.
Differential Revision: https://reviews.llvm.org/D106681
This patch adds a checker checking `std::string` operations.
At first, it only checks the `std::string` single `const char *`
constructor for nullness.
If It might be `null`, it will constrain it to non-null and place a note
tag there.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111247
Prior to this, the solver was only able to verify whether two symbols
are equal/unequal, only when constants were involved. This patch allows
the solver to work over ranges as well.
Reviewed By: steakhal, martong
Differential Revision: https://reviews.llvm.org/D106102
Patch by: @manas (Manas Gupta)
Summary:
`a % b != 0` implies that `a != 0` for any `a` and `b`. This patch
extends the ConstraintAssignor to do just that. In fact, we could do
something similar with division and in case of multiplications we could
have some other inferences, but I'd like to keep these for future
patches.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51940
Reviewers: noq, vsavchenko, steakhal, szelethus, asdenyspetrov
Subscribers:
Differential Revision: https://reviews.llvm.org/D110357
Based on post-commit review discussion on
2bd8493847 with Richard Smith.
Other uses of forcing HasEmptyPlaceHolder to false seem OK to me -
they're all around pointer/reference types where the pointer/reference
token will appear at the rightmost side of the left side of the type
name, so they make nested types (eg: the "int" in "int *") behave as
though there is a non-empty placeholder (because the "*" is essentially
the placeholder as far as the "int" is concerned).
This was originally committed in 277623f4d5
Reverted in f9ad1d1c77 due to breakages
outside of clang - lldb seems to have some strange/strong dependence on
"char [N]" versus "char[N]" when printing strings (not due to that name
appearing in DWARF, but probably due to using clang to stringify type
names) that'll need to be addressed, plus a few other odds and ends in
other subprojects (clang-tools-extra, compiler-rt, etc).
This patch remove the override in AIX target,
so the int128 is enabled in 64 bit mode or with ForceEnableInt128.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D111078
'(self.prop)' produces a surprising AST where ParenExpr
resides inside `PseudoObjectExpr.
This breaks ObjCMethodCall::getMessageKind() which in turn causes us
to perform unnecessary dynamic dispatch bifurcation when evaluating
body-farmed property accessors, which in turn causes us
to explore infeasible paths.
Looks like lldb has some issues with this - somehow it causes lldb to
treat a "char[N]" type as an array of chars (prints them out
individually) but a "char [N]" is printed as a string. (even though the
DWARF doesn't have this string in it - it's something to do with the
string lldb generates for itself using clang)
This reverts commit 277623f4d5.
Based on post-commit review discussion on
2bd8493847 with Richard Smith.
Other uses of forcing HasEmptyPlaceHolder to false seem OK to me -
they're all around pointer/reference types where the pointer/reference
token will appear at the rightmost side of the left side of the type
name, so they make nested types (eg: the "int" in "int *") behave as
though there is a non-empty placeholder (because the "*" is essentially
the placeholder as far as the "int" is concerned).
The solver's symbol simplification mechanism was not able to handle cases
when a symbol is simplified to a concrete integer. This patch adds the
capability.
E.g., in the attached lit test case, the original symbol is `c + 1` and
it has a `[0, 0]` range associated with it. Then, a new condition `c == 0`
is assumed, so a new range constraint `[0, 0]` comes in for `c` and
simplification kicks in. `c + 1` becomes `0 + 1`, but the associated
range is `[0, 0]`, so now we are able to realize the contradiction.
Differential Revision: https://reviews.llvm.org/D110913
If the `assume-controlled-environment` is `true`, we should expect `getenv()`
to succeed, and the result should not be considered tainted.
By default, the option will be `false`.
Reviewed By: NoQ, martong
Differential Revision: https://reviews.llvm.org/D111296
The `getenv()` function might return `NULL` just like any other function.
However, in case of `getenv()` a state-split seems justified since the
programmer should expect the failure of this function.
`secure_getenv(const char *name)` behaves the same way but is not handled
right now.
Note that `std::getenv()` is also not handled.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111245
Clarify the message provided when the analyzer catches the use of memory
that is allocated with size zero.
Differential Revision: https://reviews.llvm.org/D111655
There is an error in the implementation of the logic of reaching the `Unknonw` tristate in CmpOpTable.
```
void cmp_op_table_unknownX2(int x, int y, int z) {
if (x >= y) {
// x >= y [1, 1]
if (x + z < y)
return;
// x + z < y [0, 0]
if (z != 0)
return;
// x < y [0, 0]
clang_analyzer_eval(x > y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
}
}
```
We miss the `FALSE` warning because the false branch is infeasible.
We have to exploit simplification to discover the bug. If we had `x < y`
as the second condition then the analyzer would return the parent state
on the false path and the new constraint would not be part of the State.
But adding `z` to the condition makes both paths feasible.
The root cause of the bug is that we reach the `Unknown` tristate
twice, but in both occasions we reach the same `Op` that is `>=` in the
test case. So, we reached `>=` twice, but we never reached `!=`, thus
querying the `Unknonw2x` column with `getCmpOpStateForUnknownX2` is
wrong.
The solution is to ensure that we reached both **different** `Op`s once.
Differential Revision: https://reviews.llvm.org/D110910
This simple change addresses a special case of structure/pointer
aliasing that produced different symbolvals, leading to false positives
during analysis.
The reproducer is as simple as this.
```lang=C++
struct s {
int v;
};
void foo(struct s *ps) {
struct s ss = *ps;
clang_analyzer_dump(ss.v); // reg_$1<int Element{SymRegion{reg_$0<struct s *ps>},0 S64b,struct s}.v>
clang_analyzer_dump(ps->v); //reg_$3<int SymRegion{reg_$0<struct s *ps>}.v>
clang_analyzer_eval(ss.v == ps->v); // UNKNOWN
}
```
Acks: Many thanks to @steakhal and @martong for the group debug session.
Reviewed By: steakhal, martong
Differential Revision: https://reviews.llvm.org/D110625
(This relands 59337263ab and makes sure comma operator
diagnostics are suppressed in a SFINAE context.)
While at it, add the diagnosis message "left operand of comma operator has no effect" (used by GCC) for comma operator.
This also makes Clang diagnose in the constant evaluation context which aligns with GCC/MSVC behavior. (https://godbolt.org/z/7zxb8Tx96)
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D103938
While at it, add the diagnosis message "left operand of comma operator has no effect" (used by GCC) for comma operator.
This also makes Clang diagnose in the constant evaluation context which aligns with GCC/MSVC behavior. (https://godbolt.org/z/7zxb8Tx96)
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D103938
This improves diagnostic (& important to me, DWARF) accuracy - otherwise
there could be ambiguities between "std::nullptr_t" and some user-defined
type that's /actually/ "nullptr_t" defined in the global namespace.
Differential Revision: https://reviews.llvm.org/D110044