Summary:
This diff adds some semantic models for C++ string length. It introduces
an virtual field for string length and use the value in the models.
* basic_string constructor given a constant string
* string.empty
* string.length
Reviewed By: jvillard
Differential Revision: D29390815
fbshipit-source-id: 99d67e48e
Summary:
Some funky C++ way of calling the parent's constructor triggers a
function in the frontend that used to reverse the order of parameters.
Reviewed By: skcho
Differential Revision: D28832500
fbshipit-source-id: 1032de2ca
Summary:
Unknown functions may create false positives as well as false negatives
for Pulse. Let's consider that unknown functions behave "functionally",
or at least that a functional behaviour is a possible behaviour for
them: when called with the same parameter values, they should return the
same value.
This is implemented purely in the arithmetic domain by recording
`v_return = f_unknown(v1, v2, ..., vN)` for each call to unknown
functions `f_unknown` with values `v1`, `v2`, ..., `vN` (and return
`v_return`). The hope is that this will create more false negatives than
false positives, as several FPs have been observed on real code that
would be suppressed with this heuristic.
The other effect this has on reports is to record hypotheses made on the
return values of unknown functions into the "pruned" part of formulas,
which inhibits reporting on paths whose feasibility depends on the
return value of unknown functions (by making these issues latent
instead). This should allow us to control the amount of FPs until we
model more functions.
Reviewed By: skcho
Differential Revision: D27798275
fbshipit-source-id: d31cfb8b6
Summary:
It's better to remember the first reason why an address must be valid,
etc.
Reviewed By: skcho
Differential Revision: D28674729
fbshipit-source-id: 3b69de7ef
Summary:
Most/all of the time we expect the history of the value to faithfully
trace how it got allocated. That history was then added as a prefix of
the trace leading to the same place, leading to duplicate information in
the report trace.
We may need to do the same for other bug types.
Reviewed By: ezgicicek
Differential Revision: D28536891
fbshipit-source-id: a83a2d038
Summary:
Make it more obvious why we don't add an Allocated attribute in these
models.
Reviewed By: ezgicicek
Differential Revision: D28536892
fbshipit-source-id: 643539ae6
Summary:
As explained in the code comment, these reports are generally
non-actionable at best and false positives at worst:
skip reporting for constant dereference (eg null dereference) if the source of the null value is
not on the path of the access, otherwise the report will probably be too confusing: the actual
source of the null value can be obscured as any value equal to 0 (or the constant) can be
selected as the candidate for the trace, even if it has nothing to do with the error besides
being equal to the value being dereferenced
Reviewed By: da319
Differential Revision: D28350193
fbshipit-source-id: 0cd76d252
Summary:
Turns out the mistake was pretty simple: we just forgot to keep the
history of the return value in the callee and add it to the caller's.
Reviewed By: skcho
Differential Revision: D28385941
fbshipit-source-id: 40fe09c99
Summary:
- Changed "passed as argument to f" to "in call to f", as these do not
always correspond to passing an argument (eg could be a value returned
from f)
- Changed "assigned" to "returned" when appropriate
- Changed the model of malloc() to not say "allocated" in the null case
- Don't print "returned from f" when there was no event inside f: just
print "in call to f".
Reviewed By: da319
Differential Revision: D28413900
fbshipit-source-id: bc85625e3
Summary:
This diff addresses `GenericArrayBackedCollection.field` and others as pointers. The modeled fields are used as non-pointer struct fields, but their actual semantics are pointers that may have side effects.
For example, `GenericArrayBackedCollection.field` is used for keeping an information that the previous vector's address could be invalid.
```
void foo(vector v) {
v.push_back(0); // v's previous address may be invalid after push_back
// PRE: {v -> {backing_array -> v1}}
// POST: {v -> {backing_array -> v2}}
// ATTR: {v1 may be invalidated}
}
```
However, if we revert the modeled field values, it will return incorrect summary as follows, by reverting non-pointer parameter values.
```
// PRE: {v -> {backing_array -> v1}}
// POST: {v -> {backing_array -> v1}}
// ATTR: {v1 may be invalidated}
```
Reviewed By: jvillard
Differential Revision: D28324161
fbshipit-source-id: 96451d4b0
Summary:
There's been regressions in --pulse-isl. Without tests, everything is
temporary!
Note: the regressions are presumably still there, this just records the
current status of pulse.isl.
Also, no objective-C(++) at the moment. Should we add them too? (in
another diff)
Reviewed By: skcho
Differential Revision: D28256703
fbshipit-source-id: 700b2cc57
Summary:
A previous change made pulse look into value histories for causes of
invalidation in case the access trace of a value already contained the
reason why that value is invalid, in order to save printing the
invalidation trace in addition to the access trace. It also made
reporting more accurate for null dereference as the source of null was
often better identified (in cases where several values are null or
zero).
But, the history is also relevant to the bug type and the error message.
Make these take histories into account too.
Also fix a bug where we didn't look inside the sub-histories contained
within function calls when looking for an invalidation along the
history.
Reviewed By: da319
Differential Revision: D28254334
fbshipit-source-id: 5ca00ee54
Summary:
When garbage-collecting addresses we would also remove their attributes.
But even though the addresses are no longer allocated in the heap, they
might show up in the formula and so we need to remember facts about
them.
This forces us to detect leaks closer to the point where addresses are
deleted from the heap, in AbductiveDomain.ml. This is a nice refactoring
in itself: doing so fixes some other FNs where we sometimes missed leak
detection on dead addresses.
This also makes it unecessary to simplify InstanceOf eagerly when
variables get out of scope.
Some new {folly,std}::optionals false positives that either are similar to existing ones or involve unmodelled smart pointers.
Reviewed By: da319
Differential Revision: D28126103
fbshipit-source-id: e3a903282
Summary:
Building on the infra in the previous commits, "fix" all the call sites
that introduce invalidations to make sure they also update the
corresponding histories. This is only possible to do when the access
leading to the invalidation can be recorded. Right now the only place
that's untraceable is the model of `free`/`delete`, because it happens
to be the only place where we invalidate an address without knowing
where it comes from (`free(v)`: what was v's access path? we could track
this in the future).
Reviewed By: skcho
Differential Revision: D28118764
fbshipit-source-id: de67f449e
Summary: Not tracking values for global constants might cause nullptr_dereference false positives. In particular, if the code has multiple checks and uses a global constant by its name in one check and its value in another check (see added test case), we are not able prune infeasible paths. This diff addresses such false positives by inlining initializers of global constants when they are being used. An assumption is that most the time the initialization of global constants would not have side effects.
Reviewed By: jvillard
Differential Revision: D25994898
fbshipit-source-id: 26360c4de
Summary:
As explained in the previous diff: when the access trace goes through
the invalidation step there is no need to print the invalidation trace
at all.
Note: only a few sources of invalidation are handled at the moment. The
following diffs gradually fix the other sources of invalidation.
Reviewed By: skcho
Differential Revision: D28098335
fbshipit-source-id: 5a5e6481e
Summary:
The eventual goal is to stop having separate sections of the trace
("invalidation part" + "access part") when the "access part" already
goes through the invalidation step. For this, it needs to record when a
value is made invalid along the path.
This is also important for assignements to NULL/0/nullptr/nil: right now
the way we record that 0 is not a valid address is via an attribute
attached to the abstract value that corresponds to 0. This makes traces
inconsistent sometimes: 0 can appear in many places in the same function
and we won't necessarily pick the correct one. In other words, attaching
traces to *values* is fragile, as the same value can be produced in many
ways. On the other hand, histories are stored at the point of access, eg
x->f, so have a much better chance of being correct. See added test:
right now its traces is completely wrong and makes the 0 in `if
(utf16StringLen == 0)` the source of the NULL value instead of the
return of `malloc()`!
This diff makes the traces slightly more verbose for now but this is
fixed in a following diff as the traces that got longer are those that
don't actually need an "invalidation" trace.
Reviewed By: skcho
Differential Revision: D28098337
fbshipit-source-id: e17929259
Summary:
This diff avoids dereference of C struct, in its frontend and its semantics of Pulse. In SIL, C
struct is not first-class value, thus dereferencing on it does not make sense.
Reviewed By: ezgicicek
Differential Revision: D27953258
fbshipit-source-id: 348d56338
Summary:
Previously we were only taking constexpr into account on constructors.
Add this info to ProcAttributes.t instead by exporting it from the
plugin for all functions.
This allows SIOF to take constexpr into account in more cases as it's not
always good at capturing which functions *can* be constexpr-evaluated,
which caused false positives.
Delete now-useless is_constexpr in constructor types. This generated the
changes in frontend tests.
Some minor renamings of variants of is_const_expr -> is_constexpr.
Reviewed By: da319
Differential Revision: D27503433
fbshipit-source-id: 3d1972900
Summary:
This diff evaluates a cpp vector given as a parameter symbolically. Especially, it addresses it as an array, so the cost checker can use its symbolic length correctly.
**About handling `cpp.vector_elem` field:**
The field is a virtual field of vector object that points to the array of vector elements. It was introduced in Inferbo to model semantics of vector operations.
Since many semantics of Inferbo depends on type information, it had collected type information of vector elements, whenever `cpp.vector_elem` field was introduced, as a *side-effect*. A problem is that it has *side-effect*, which means it may introduce non-deterministic analysis results depending on the type information of the virtual field.
This diff changes it not to collect the type information on `cpp.vector_elem` as a side-effect. Instead, it tries to write the information to the abstract states (abstract location) when possible.
Reviewed By: ezgicicek
Differential Revision: D27674935
fbshipit-source-id: f3d52cae7
Summary:
Whenever an equality "t = v" (t an arbitrary term, v a variable) is
added (or "v = t"), remember the "t -> v" mapping after canonicalising t
and v. Use this to detect when two variables are equal to the same term:
`t = v` and `t = v'` now yields `v = v'` to be added to the equality
relation of variables. This increases the precision of the arithmetic
engine.
Interestingly, the impact on most code I've tried is:
1. mostly same perfs as before, if a bit slower (could be within noise)
2. slightly more (latent) bugs reported in absolute numbers
I would have expected it to be more expensive and yield fewer bugs (as
fewer false positives), but there could be second-order effects at play
here where we get more coverage. We definitely get more latent issues
due to dereferencing pointers after testing nullness, as can be seen in
the unit tests as well, which may alone explain (2).
There's some complexity when adding term equalities where the term
is linear, as we also need to add it to `linear_eqs` but `term_eqs` and
`linear_eqs` are interested in slightly different normal forms.
Reviewed By: skcho
Differential Revision: D27331336
fbshipit-source-id: 7314e127a
Summary:
When we don't know the value being shifted it may help to translate
bit-shifting into multiplication by a constant as it might surface
linear terms, eg `x<<1` is `2*x`.
Reviewed By: skcho
Differential Revision: D27464847
fbshipit-source-id: 9b3b5f0d0
Summary:
On some pathological examples of crypto primitives like libsodium, later
diffs make pulse grind to a halt due to an explosion in the size of
literals. This is at least partly due to the fact the arithmetic doesn't
operate modulo 2^64.
Due to the fact the arithmetic is confused in any case when we reach
such large numbers, cap them, currently at 2^128. This removes pathological
cases for now, even now on libsodium Pulse is ~5 times faster than before!
Take this opportunity to put the modified Q/Z modules in the own files.
Reviewed By: jberdine
Differential Revision: D27463933
fbshipit-source-id: 342d941e2
Summary:
This was added in C++14. Was investigating how SIOF dealt with this but
it turns out it already does the right thing as the translation unit of
global variable templates shows up as the place they are instantiated
(not the one where they are declared), which works well for SIOF
checking.
Reviewed By: da319
Differential Revision: D27500998
fbshipit-source-id: b8b9b9c48
Summary:
The title
Also notice that there is a duplication of an error.
Reviewed By: skcho
Differential Revision: D27426933
fbshipit-source-id: dbd2f861a
Summary:
See updated tests and code comments: this changes many arithmetic
operations to detect when a contradiction "p|->- * p=0" is about to be
detected, and generate a latent issue instead. It's hacky but it does
what we want. Many APIs change because of this so there's some code
churn but the overall end result is not much worse thanks to monadic
operators.
Reviewed By: skcho
Differential Revision: D26918553
fbshipit-source-id: da2abc652
Summary:
This diff handles live variables in catch blocks. To do that, this diff adds another metadata,
`CatchEntry`.
Domain change: The domain is changed to
```
(normal:variables) x (exn:try_id->variables)
```
`exn` is a map from try-catch-id to a set of live variables that are live at the corresponding entry
of catch blocks.
Semantics change: It is a backward analysis.
* on `CatchEntry`: It updates `exn` with `try_id` and current `normal`.
* on `Call`: As of now, we assume all function calls can raise an exception. Therefore, it copies
all live variables in `exn` to `normal`.
* on `TryEntry`: It removes corresponding `try_id` from `exn`.
Reviewed By: jvillard
Differential Revision: D26952755
fbshipit-source-id: 1da854a89
Summary: This diff adds TryEntry and TryExit statements to the entry and exit of C++ `try` block, in order to handle exceptional control flow better in analyses.
Reviewed By: da319, jvillard
Differential Revision: D26946188
fbshipit-source-id: 33f4ae9e7
Summary:
Update Infer to LLVM (clang) 11.1.0.
Infer/clang now uses the LLVM 'monorepo' release, simplifying the download script.
Some changes done to how/when ASTExporter mangles names, this to avoid the
plugin hitting asserts in the clang code when mangling names.
Reviewed By: jvillard
Differential Revision: D27006986
fbshipit-source-id: 4d4b6ba05
Summary:
Adapting error messages in Pulse so that they become more intuitive for
developers.
Reviewed By: jvillard
Differential Revision: D26887140
fbshipit-source-id: 896970ba2
Summary:
See added tests. Passing a variable by reference to a function `foo` can
cause the variable to be added to the global state so any stores after
that might be live as long as there is another function call after the
store (since the global state shouldn't outlive the scope of the
function). Currently we don't check that the latter is true; to report
these we would need to extend the abstract domain to remember which
stores have been made without a subsequent call.
Also change Blacklisted -> Dangerous everywhere since the corresponding
option is called "liveness_dangerous_classes".
Reviewed By: skcho
Differential Revision: D26606151
fbshipit-source-id: e869e5df1
Summary:
This resolves a few instances of false negatives; typically:
```
if (x == y) {
// HERE
*x = 10;
*y = 44;
// THERE
}
```
We used to get
```
HERE: &x->v * &y ->v' * v == v'
THERE: &x->v * &y ->v' * v == v' * v |-> 10 * v' |-> 44
```
The state at THERE was thus inconsistent and detected as such (v` and
`v'` are allocated separately in the heap hence cannot be equal).
Now we normalize the state more eagerly and so we get:
```
HERE: &x->v * &y->v
THERE: &x->v * &y->v * v |-> 44
```
Reviewed By: skcho
Differential Revision: D26488377
fbshipit-source-id: 568e685f0
Summary:
These were present for `std::optional` but not `folly::Optional` for
some reason.
Reviewed By: da319
Differential Revision: D26450400
fbshipit-source-id: 45051e828
Summary:
The `--pulse-model-return-nonnull` config option currently works for C++. Now we
will be using it also for Java. Changing type from string list to regexp to
make it more general.
Reviewed By: ezgicicek
Differential Revision: D26367888
fbshipit-source-id: 9a06b9b32
Summary: D25952894 (1bce54aaf3) changes translation of struct assignments. This diff adopts to this change for loads from global struct arrays.
Reviewed By: skcho
Differential Revision: D26398627
fbshipit-source-id: cc1fb47ab
Summary:
Before this diff:
```
// Summary of const global
// { global -> v }
n$0 =* global
// n$0 -> {global}
x *= n$0
// x -> {global}
```
However, this is incorrect because we expect `x` have `v` instead of the abstract location of `global`.
To fix the issue, this diff lookups the initializer summary when `global` is evaluated as RHS of load statement.
After this diff:
```
// Summary of const global
// { global -> v }
n$0 =* global
// n$0 -> v
x *= n$0
// x -> v
```
Reviewed By: ezgicicek
Differential Revision: D26369645
fbshipit-source-id: 98b1ed085
Summary:
Having different behaviours inter-procedurally and intra-procedurally
sounds like a bad design in retrospect. The model of free() should not
depend on whether we currently know the value is not null as that means
some specs are missing from the summary.
Reviewed By: skcho
Differential Revision: D26019712
fbshipit-source-id: 1ac4316a5
Summary:
When a single field struct is initialized with "type x{v}" form, the translated result is not straightforward. For example,
```
struct t {
int val_;
};
void foo(t x) {
t y{x};
}
```
calls the copy constructor with `x`. This is good.
```
void foo(int n) {
t y{n};
}
```
assigns the integer `n` to `y.val_`. This is good.
```
t get_v();
void foo() {
t y{get_v()};
}
```
assigns return value of `get_v` to `y.val_`, rather than calling the copy constructor. This is not
good, but doesn't matter for actual running; `&y.val_` is the same to `&y` and `t` value is the same
to `int` value.
Reviewed By: jvillard
Differential Revision: D26146578
fbshipit-source-id: 8a81bb1db
Summary: This diff fixes incorrect order of statements on `*p = !b;`.
Reviewed By: jvillard
Differential Revision: D26125069
fbshipit-source-id: 9dcefbd34
Summary:
This diff fixes incorrect order of statements on assignments.
In the translation of `LHS=RHS;`, if `RHS` is a complicated expression that introduced new nodes, eg a conditional expression, some load statements for `LHS` came after its usage. To avoid the issue, this diff forces it to introduce new nodes for `LHS`.
Reviewed By: jvillard
Differential Revision: D26099782
fbshipit-source-id: 27417cd99
Summary:
D20769039 (cec8cbeff2) added a preanalysis step that creates edges from throw nodes to all reachable catch nodes. It intended to fix some deadstore FPs however it caused more damage than the fix itself. In particular, throws were connected irrespective of
- the type of the exception
- whether the try was surrounded by a catch
This in turn caused weird CFGs with dangling and impossible to understand nodes:(
This diff reverts this change for now.
Instead, the fix should probably be done in the frontend where we have more information about try/catch blocks.
Reviewed By: da319
Differential Revision: D25997475
fbshipit-source-id: bbeabfbef
Summary:
Lambdas are essentially private (but are not marked as such in Infer),
so we should only report on their non-private callers.
Meanwhile, add a test to document that access propagation to those
callers is currently broken.
Reviewed By: da319
Differential Revision: D25944811
fbshipit-source-id: ef8ca6d9c