Summary:
This gets rid of false positives when something invalid (eg null) is
passed by reference to an initialisation function. Havoc'ing what the
contents of the pointer to results in being optimistic about said
contents in the future.
Also surprisingly gets rid of some FNs (which means it can also
introduce FPs) in the `std::atomic` tests because a path condition
becomes feasible with havoc'ing.
There's a slight refinement possible where we don't havoc pointers to
const but that's more involved and left as future work.
Reviewed By: skcho
Differential Revision: D18726203
fbshipit-source-id: 264b5daeb
Summary:
It's a well-known fact that pulse should know too. To avoid splitting
the abstract state systematically, only act if we know the pointer is
exactly 0 to avoid reporting a nullptr dereference on `free(x)`.
Reviewed By: ezgicicek
Differential Revision: D18708575
fbshipit-source-id: 1cc3f6908
Summary:
Turns out code uses atomics in important places, modelling it removes
FPs.
The tests are copied from biabduction and adapted and extended a bit. I
didn't implement compare_exchange primitives for now (plus, giving them
a sequential semantics like in biabduction is probably a bit cheeky).
Reviewed By: skcho
Differential Revision: D18708576
fbshipit-source-id: a3581b8a4
Summary:
This diff adds inferbo's interval values to pulse's attributes. The added values will be used to
filter out infeasible passes in the following diffs.
Reviewed By: jvillard
Differential Revision: D18726667
fbshipit-source-id: c1125ac6e
Summary:
Note: Disabled by default.
Having some support for values, we can report when a null or constant
value is being dereferenced. The particularity here is that we don't
report when 0 is a possible value for the address, or even if we know
that the value of the address can only be 0 in that branch! Instead, we
allow ourselves to report only when we the address has been *set* to
NULL (or any constant).
This is in line with how pulse deals with other issues: only report when
1. we see an address become invalid, and
2. we see the same address be used later on
Reviewed By: skcho
Differential Revision: D17665468
fbshipit-source-id: f1ccf94cf
Summary:
This adds a more interesting value domain to pulse: concrete intervals.
There are still two main limitations:
1. arithmetic operations are all over-approximated: any assignment involving arithmetic operations is replaced by non-determinism
2. abstract values that are discovered to be equal are not merged into one
Reviewed By: skcho
Differential Revision: D18058972
fbshipit-source-id: 0492a590f
Summary:
This does several things because it was hard to split it more:
1. Split most of the arithmetic reasoning to PulseArithmetic.ml. This
doesn't need to be reviewed thoroughly because an upcoming diff
changes the domain from just `EqualTo of Const.t` to an interval domain!
2. When going through a prune node intra-procedurally, abduce arithmetic
facts to the pre (instead of just propagating them). This is the "assume
as assert" trick used by biabduction 1.0 too and allows to propagate
arithmetic constraints to callers.
3. Use 2 when applying summaries by pruning specs whose preconditions
have un-satisfiable arithmetic constraints.
This changes one of the tests! Pulse now does a bit more work to find
the false positive, as can be seen in the longer trace.
Reviewed By: skcho
Differential Revision: D18117160
fbshipit-source-id: af3b2c8c0
Summary: In preparation for improvements to the arithmetic reasoning.
Reviewed By: dulmarod
Differential Revision: D17977207
fbshipit-source-id: ee98e0772
Summary:
bigmacro_bender
There are 3 ways pulse tracks history. This is at least one too many. So
far, we have:
1. "histories": a humble list of "events" like "assigned here", "returned from call", ...
2. "interproc actions": a structured nesting of calls with a final "action", eg "f calls g calls h which does blah"
3. "traces", which combine one history with one interproc action
This diff gets rid of interproc actions and makes histories include
"nested" callee histories too. This allows pulse to track and display
how a value got assigned across function calls.
Traces are now more powerful and interleave histories and interproc
actions. This allows pulse to track how a value is fed into an action,
for instance performed in callee, which itself creates some more
(potentially now interprocedural) history before going to the next step
of the action (either another call or the action itself).
This gives much better traces, and some examples are added to showcase
this.
There are a lot of changes when applying summaries to keep track of
histories more accurately than was done before, but also a few
simplifications that give additional evidence that this is the right
concept.
Reviewed By: skcho
Differential Revision: D17908942
fbshipit-source-id: 3b62eaf78
Summary:
- add the variable being declared so we can report it back in the trace in addition to its location
- distinguish between local vars and formals
Reviewed By: skcho
Differential Revision: D17930348
fbshipit-source-id: a5b863e64
Summary:
When we make the decision to go into a branch "v = N" where some
abstract value is compared to a constant, remember the corresponding
equality. This allows to prune simple infeasible paths
intra-procedurally.
Further work is needed to make this useful interprocedurally, for
instance either or both of these ideas could be explored:
- abduce v=N in the precondition and do not apply summaries when the
equalities in the pre are not satisfied
- prune post-conditions that lead to unsat states where a value has to
be equal to several different constants
Reviewed By: skcho
Differential Revision: D17906166
fbshipit-source-id: 5cc84abc2
Summary:
When we know "x = 3" and we have a condition "x != 3" we know we can
prune the corresponding path.
Reviewed By: skcho
Differential Revision: D17665472
fbshipit-source-id: 988958ea6
Summary:
Unfortunately it is very hard to predict when
`Typ.Procname.describe` will add `()` after the function name, so we
cannot make sure it is always there.
Right now we report clowny stuff like "error while calling `foo()()`",
which this change fixes.
Reviewed By: ezgicicek
Differential Revision: D17665470
fbshipit-source-id: ef290d9c0
Summary:
Pulse didn't treat local variables going out of scope as invalidating the corresponding address in memory. This diff fixes that by
- marking all local variables that exits the scope with the attribute `AddressOfStackVariable`
- before we write the summary for the proc, we make sure to invalidate all such addresses local to the procedure as `Invalid.` If such an address is read, then we would raise a use-after-lifetime issue.
Reviewed By: jvillard
Differential Revision: D16458355
fbshipit-source-id: 3686524cb
Summary:
A common gotcha is the new test. Model the minimum amount of
`std::basic_string` to catch it.
Reviewed By: mbouaziz, ngorogiannis
Differential Revision: D16121090
fbshipit-source-id: 66f06cb43
Summary:
Be more flexible in what type of function calls are allowed in `ViaCall ...` actions to be able to include models.
Also get rid of `here here` in traces /o\
As a side-effect, get more precise (=qualified) procedure names in
traces (but not in messages so as not to be too verbose).
Reviewed By: mbouaziz, ngorogiannis
Differential Revision: D16121092
fbshipit-source-id: fb51b02f8
Summary:
So it turns out we need to translate even more cases. Pulse had a FP
before that this fixes.
Reviewed By: ezgicicek
Differential Revision: D16073629
fbshipit-source-id: c03460b5a
Summary:
This is needed to test some functionality in the next diff. Only one
test changes (no longer a FN), which is now documented. Also, stop
including the "header models" meant for biabduction!
Maybe one day we'll need to have several test modes for different C++
versions. Seems overkill for now, so let's wait until we see some actual
issues (eg FPs) that manifest in one version but not the other.
Reviewed By: mbouaziz
Differential Revision: D16073630
fbshipit-source-id: 1cfdfc933
Summary:
Sometimes the post of a function call has attributes on addresses that
were mentioned in the pre but are no longer reachable in the post. We
don't want to forget these, see added test.
Reviewed By: mbouaziz
Differential Revision: D16050050
fbshipit-source-id: 1ce522b97
Summary:
The previous code would call the destructor for the C++ temporary
*before* the prune nodes, which then try to dereference it. Wrong.
Quick fix: don't destroy temporaries in conditionals.
Reviewed By: mbouaziz
Differential Revision: D16030735
fbshipit-source-id: e11abad58
Summary:
We were skipping some instructions before and that was a problem for
pulse. See added pulse test.
Reviewed By: mbouaziz
Differential Revision: D16030150
fbshipit-source-id: 9c62e6213
Summary: Not sure if anyone uses this but there, now it's modelled.
Reviewed By: mbouaziz
Differential Revision: D16008162
fbshipit-source-id: f4795dcba
Summary:
Prevent false positives about variables captured by value gone out of
scope.
Reviewed By: ezgicicek
Differential Revision: D16008165
fbshipit-source-id: d70e47db4
Summary: We know how to do interprocedural calls so let's use that!
Reviewed By: mbouaziz
Differential Revision: D16008164
fbshipit-source-id: 4c34bf704
Summary:
`function::operator=` is called whenever we assign a literal lambda to a
variable, so it's pretty useful to be able to report anything on
lambdas.
Reviewed By: mbouaziz
Differential Revision: D16008163
fbshipit-source-id: a9d07668d
Summary:
Printing `Exp.Const (Cfun proc_name)` adds `_fun_` in front of the
procedure name, eg `_fun_foo` instead of `foo`. This showed up in pulse
traces.
Reviewed By: mbouaziz
Differential Revision: D16004606
fbshipit-source-id: 72ac6866f
Summary:
Fixes a false positive where the address of a C++ temporary is bound to
a static const reference variable then returned. The fix doesn't try to
establish that the variable is a const reference so could lead to false
negatives but that can be addressed later.
Reviewed By: ezgicicek
Differential Revision: D16004538
fbshipit-source-id: e403dbefe
Summary:
[apologies for the unreviewable diff...]
Get rid of HIL expressions in pulse. This finishes the HIL -> SIL
migration. The first step made pulse start from SIL instructions but
would translate most accesses to HIL to re-use most of the existing
pulse code. This diff gets rid of the intermediate translation of SIL
expressions to HIL expressions.
Big changes:
1. `PulseOperations` mostly rewritten, driven by using `Exp.t` instead of `HilExp.AccessExpression.t` for everything.
2. Stop trying to reverse-engineer what addresses mean in terms of
access paths from program variables. Rely on the trace pointing at
the right places in the code to be enough. This is because it wasn't
that useful (and could even be misleading when wrong) but could be
prohibitively expensive in degenerate cases (eg nodes with tens of
thousands of successive array accesses...)
3. `PulseAbductiveDomain.apply_post` now returns the computed return
value instead of recording it itself.
4. Change of vocabulary: `materialize` -> `eval`, `crumb` -> `event`
5. Function calls arguments are now evaluated prior to doing anything
else, which saves everything else from having to (remember to) do
that. In particular, this changes how models look quite a bit.
Reviewed By: mbouaziz
Differential Revision: D15986373
fbshipit-source-id: 1d79935de
Summary: Inject destructor calls to destroy a temporary when its lifetime ends.
Reviewed By: mbouaziz
Differential Revision: D15674209
fbshipit-source-id: 0f783a906
Summary:
Now that HIL doesn't help us anymore we need to reconstruct its mapping
"SIL logical var -> program access path". We already have everything we
need in pulse: it suffices to walk the current memory graph starting
from program variables until we find the value of the temporary we are
interested in.
This diff also builds some type machinery to make sure all accesses are
explained.
Reviewed By: mbouaziz
Differential Revision: D15824959
fbshipit-source-id: 722c81b39
Summary:
It turns out HIL gets in the way of a precise heap analysis. For
instance, instead of:
```
n$0 = *&x.f
_ = delete(&x)
*&y = n$0
```
HIL tries hard to forget about intermediate variables and shows instead
```
_ = delete(&x)
*&y = *&x.f
```
Oops, that's a use-after-delete, whereas the original code was safe.
While it's easy to write SIL programs that are completely unsound for
HIL, they are not generated very often from the frontends. In fact, the
problem became apparent only when making the clang frontend translate
C++ temporaries destructors, which produces the situation above
routinely.
This diff makes the minimal amount of change to make Pulse build and
produce equivalent results (minus HIL bugs) starting from SIL instead of
HIL. The reporting sucks for now because we need to translate SIL
temporaries back into program access paths. This is done in the next
diff.
Reviewed By: mbouaziz
Differential Revision: D15824961
fbshipit-source-id: 8e4e2a3ed
Summary:
This one isn't caught because we don't destruct temporaries that are
bound to a const reference. According to the C++ standard these should
get destroyed when the const reference gets destroyed but instead we
just don't destroy them for now.
Reviewed By: mbouaziz
Differential Revision: D15760209
fbshipit-source-id: 32c935ec0
Summary:
In a next diff temporaries will get destructed at the end of their
lifetimes and that naive model would be causing false positives.
The flipside is that we lose all reports on closures for now, will need
to model them separately later.
Reviewed By: mbouaziz
Differential Revision: D15695943
fbshipit-source-id: c2c482c02
Summary:
This started as an attempt to understand how to modify the frontend to
inject destructors for C++ temporaries (see next diffs).
This diff rewrites the existing logic for computing the list of
variables that should be destroyed at the end of each statement, either
because it's the end of their syntactic scope or because control flow
branches outside of their syntactic scope.
The frontend translates a function from the last instructions to the
first, but scope computation needs to be done in the other direction, so
it's done in a separate pass *before* the main translation happens. That
first pass creates a map from statements in the AST to the list of
variables that should be destroyed at the end of these statements. This
is still the case now.
Before, that map would be computed in a bit of a weird way: scopes are
naturally a stack but instead of that the structure maintained was a
flat list + a counter to know where the current scope ended in that
list.
In this diff, redo the computation maintaining a stack of scopes
instead, which is a bit cleaner. Also treat more instructions as
introducing a new scope, eg if, for, ...
Reviewed By: mbouaziz
Differential Revision: D15674208
fbshipit-source-id: c92429e82
Summary:
I rewrote the test so it doesn't need any C++ headers so that:
- it's easier to see what's going on
- it's easier to debug: the whole AST is now somewhat readable vs before
the headers made it impossibly long
Reviewed By: ezgicicek
Differential Revision: D15674213
fbshipit-source-id: d98941983
Summary:
This messes with the deduplication heuristic when templated function
names show up in the error messages, since the heuristic demands that
the error messages are the same.
Reviewed By: mbouaziz
Differential Revision: D15374333
fbshipit-source-id: 70232d254
Summary:
Improve the error messages, change is more or less documented in the
code.
Reviewed By: mbouaziz
Differential Revision: D15374334
fbshipit-source-id: f1dd54180
Summary:
Before: the trace would explain how a value was invalidated and
accessed, but not how the value that was invalidated had been
constructed.
Now: `PulseTrace.t` records breadcrumbs of how the value was constructed
in addition to the interproc "action" trace leading to the invalidation
or access action.
Concretely:
```
void bad(X &x) {
X *y = x;
X *z = x;
delete y;
access(z);
}
```
will produce the trace:
Invalidation part:
y = x
delete y
Access part:
z = x
access(z)
access to z->f inside of access(z)
Before this diff the "Access part" would be missing the "z = x" part of
the trace, so it might be confusing why `z` has anything to do with `y`.
However, such "breadcrumbs" are not recorded in the inter-procedural
part, only the sequence of calls is. This is a trade-off for simplicity,
maybe it's enough for developers maybe it isn't, we'll find out later.
Reviewed By: jberdine
Differential Revision: D15354438
fbshipit-source-id: 8d0aed717
Summary:
Feedback from peterogithub:
- mention which access path is being invalidated and accessed in the message
- mention the line at which it was invalidated (the line at which it's accessed is already the line at which we report)
- traces for stack variable/C++ temporary address escapes
- delete double implementation of the same functionality in
`PulseTrace`: `location_of_action_start` is the same as
`outer_location_of_action`...
Reviewed By: jberdine
Differential Revision: D14800294
fbshipit-source-id: 3d9ab9b3d
Summary:
Similarly to function parameters (and the return value), we need to
apply the pre/post of a function call to the globals mentioned in its
summary.
- tigthen summaries further to remember only abducible variables in the
post (as well as in the pre)
- take globals into account when applying pre/post pairs
Reviewed By: jberdine
Differential Revision: D14780800
fbshipit-source-id: fc0d180bb
Summary:
The heuristic to detect variables going out of scope was to detect any
access expression passed as argument to an injected destructor call.
However destructor calls are also injected in destructor bodies to
destruct each field of an object, so the heuristic would detect fields
going out of scope, which, erm, doesn't make sense. Limit the heuristic
to local program variables.
Reviewed By: jberdine
Differential Revision: D14771454
fbshipit-source-id: ffa3c9fe3