Summary:
This diff adds support in symbolic execution for calls to intrinsic
functions, to be used in lieu of adding a separate Llair instruction
for each intrinsic. This involves:
- adding skeleton support in Exec for symbolically execution an
intrinsic function call;
- exposing this in Domain;
- allowing symbolic execution of block terminators (e.g. function
call) to possibly fail; and
- generalizing Report for failing terminators.
Reviewed By: ngorogiannis
Differential Revision: D14403652
fbshipit-source-id: d86d9d1b8
Summary:
This diff changes a LatestPrune to use a return variable instead of another local variable, when the function returns a conditional value. This is a preparation to propagate LatestPrune inter-procedurally in the following diffs.
context: If a function returns a conditional value, e.g. `return x == y`, the LatestPrune value includes a temporary local variable introduced by the SIL translation. This diff is to avoid propagating the temporary local variables to its caller.
Reviewed By: mbouaziz
Differential Revision: D14321534
fbshipit-source-id: d157bfdd0
Summary:
To meet the pure parts of formulas, the process was to (a) call Rename.extend
with variables occuring in similar places and (b) extract substitutions out of
those. Two matching primed vars would both be replaced by some fresh primed var.
However, equivalence classes of primed variables would *not* be replaced by
one fresh (primed) variable. Now, that should work.
Reviewed By: mbouaziz
Differential Revision: D14150192
fbshipit-source-id: 90ca9216c
Summary:
Some functions are translated directly to an expression, rather than a
function call. The llvm instructions for calling functions that may
and may not raise exceptions differ (Invoke and Call, resp.). This
diff adds support for the Invoke case.
Reviewed By: jvillard
Differential Revision: D14385596
fbshipit-source-id: 24458d960
Summary:
These tell llvm that contents of memory will not change during
execution of some code, which the analyzer does not need.
Reviewed By: jvillard
Differential Revision: D14385597
fbshipit-source-id: 0ef566a6f
Summary:
Make could get confused and use both the $(MODEL_DIR)/cxxabi.bc and
%.bc rules, leading to build failure.
Reviewed By: jvillard
Differential Revision: D14385600
fbshipit-source-id: 05f0ac6e1
Summary:
This will be used in the future to determine what to do with destructors
in pulse.
Reviewed By: mbouaziz
Differential Revision: D14324759
fbshipit-source-id: bc3c34471
Summary:
This seems generally useful. Force people to do it in the future even if
they want to avoid having to update the frontend tests.
Reviewed By: mbouaziz
Differential Revision: D14324758
fbshipit-source-id: cdef3f72a
Summary:
Before: the abstract state represents heap addresses as a single map
from addresses to edges + attributes.
After: the heap is made of 2 maps: one mapping addresses to edges, and
one mapping an address to its attributes.
It turns out that edges and attributes are often not updated at the same
time, so keeping them in the same map was causing pressure on the OCaml
gc.
Reviewed By: mbouaziz
Differential Revision: D14147991
fbshipit-source-id: 6713eeb3c
Summary:
This is basically unused except for debugging and is going to cause
issues later.
Reviewed By: mbouaziz
Differential Revision: D14258490
fbshipit-source-id: b2800990e
Summary:
This fixes (if in a hackish way) an inherently quadratic behaviour in
the disjunctive domain when analysing loops: If you start with some
disjuncts `D1 \/ ... \/ Dn` and go once around the loop, you will end up
with disjuncts `(D1 \/ ... \/ Dn) \/ (D1' \/ ... \/ Dn')` assuming that
for all `i`, `{ Di } body of loop { Di' }` (in practice there is the
added difficulty that the post of the body of the loop can be a
disjunction too instead of a single abstract state). Assuming this isn't
a fixpoint, we would then go around the loop again from `D1`, ..., `Dn`,
`D1'`, ..., `Dn'`. However we already know what the posts of `D1` to `Dn`
are!
This attempts to curb duplicate work by marking the disjuncts in `prev`
as "visited" and instructing symbolic execution to skip visited states.
Then, once convergence is detected (from within `widen` for now) we mark
again all states as unvisited so that whatever is after the loop gets
symbolically executed.
This is a hack because ideally the AI scheduler would know about
disjunctive domain and schedule individual disjuncts for analysis.
However that would be a much bigger change. Let's see if the hack is
enough for now.
Reviewed By: mbouaziz
Differential Revision: D14258491
fbshipit-source-id: 21454398c
Summary:
When joining two lists of disjuncts we try to ensure there isn't a state
that under-approximates another already in the list. This helps reduce
the number of disjuncts that are generated by conditionals and loops.
Before we would always just add more disjuncts unless they were
physically equal but now we do a subgraph computation to assess
under-approximation.
We only do this half-heartedly for now however, only taking into
consideration the "new" disjuncts vs the "old" ones. It probably makes
sense to do a full quadratic search to minimise the number of disjuncts
from time to time but this isn't done here.
Reviewed By: mbouaziz
Differential Revision: D14258482
fbshipit-source-id: c2dad4889
Summary:
This removes the "abstract addresses" that used to be stored in the `Closure` attribute of pulse abstract addresses. There used to be a list of values recorded for each closure, each one representing one captured value. Instead these values are now recorded as fake edges in the memory graph.
Having addresses appear in attributes causes issues when trying to establish graph isomorphism between two memory states. Avoid it by rewriting the closures mechanism to encode captured addresses as fake edges in memory. This way captured addresses are automatically treated right by the graph algorithms (in the next diffs).
Reviewed By: mbouaziz
Differential Revision: D14323044
fbshipit-source-id: 413b4d989
Summary:
The backend has an implicit assumption that the entry block's locals
are all the function locals, not just those that happen to appear in
the entry block.
An alternative would be to collect the inverse renaming substitutions
produced by symbolic execution of jumps, which rename variables to
avoid clashes with the destination block's locals. These inverse
substitutions could be applied upon function return, in inverse
order. The complication with this approach is that it would be
necessary to distinguish between a destination local that clashes
because it has the same name as a local of some function higher in the
call stack versus a clash due to being a previous incarnation of the
same local due to a control-flow cycle. Accumulating unrenamings for
the latter case is expensive and pointless, as well as incongrous with
true interprocedural extensions.
So it seems preferable to exploit the asymmetry between the entry
block and others a bit more.
Reviewed By: jvillard
Differential Revision: D14354530
fbshipit-source-id: 815cdb224
Summary:
Interpreted subexps were not handled correctly: they must not be in
the carrier, but their non-interpreted subexps should.
Reviewed By: jvillard
Differential Revision: D14344291
fbshipit-source-id: 995896640
Summary: Unknown locations in the alias domain resulted in unexpected unreachable code.
Reviewed By: mbouaziz
Differential Revision: D14339412
fbshipit-source-id: a5dca6489
Summary:
The disjunctive domain shouldn't really be a set in the first place as
comparing abstract states for equality is expensive to do naively
(walking the whole maps representing the abstract heap). Moreover in
practice these sets have a small max size (currently 50 for pulse, the
only client), so switching them to plain lists makes sense.
Reviewed By: mbouaziz
Differential Revision: D14258489
fbshipit-source-id: c512169eb
Summary:
It's useful to keep the size of states down, especially when humans are
trying to read it. It will also help keep the size of summaries down in
the inter-procedural pulse.
Reviewed By: mbouaziz
Differential Revision: D14258486
fbshipit-source-id: 45ebcac67
Summary:
You can only take the address of variables, field accesses, and array
accesses, the rest doesn't make sense.
Reviewed By: mbouaziz
Differential Revision: D14258484
fbshipit-source-id: 8ddcfe810