Summary:
Before: the abstract state represents heap addresses as a single map
from addresses to edges + attributes.
After: the heap is made of 2 maps: one mapping addresses to edges, and
one mapping an address to its attributes.
It turns out that edges and attributes are often not updated at the same
time, so keeping them in the same map was causing pressure on the OCaml
gc.
Reviewed By: mbouaziz
Differential Revision: D14147991
fbshipit-source-id: 6713eeb3c
Summary:
This is basically unused except for debugging and is going to cause
issues later.
Reviewed By: mbouaziz
Differential Revision: D14258490
fbshipit-source-id: b2800990e
Summary:
This fixes (if in a hackish way) an inherently quadratic behaviour in
the disjunctive domain when analysing loops: If you start with some
disjuncts `D1 \/ ... \/ Dn` and go once around the loop, you will end up
with disjuncts `(D1 \/ ... \/ Dn) \/ (D1' \/ ... \/ Dn')` assuming that
for all `i`, `{ Di } body of loop { Di' }` (in practice there is the
added difficulty that the post of the body of the loop can be a
disjunction too instead of a single abstract state). Assuming this isn't
a fixpoint, we would then go around the loop again from `D1`, ..., `Dn`,
`D1'`, ..., `Dn'`. However we already know what the posts of `D1` to `Dn`
are!
This attempts to curb duplicate work by marking the disjuncts in `prev`
as "visited" and instructing symbolic execution to skip visited states.
Then, once convergence is detected (from within `widen` for now) we mark
again all states as unvisited so that whatever is after the loop gets
symbolically executed.
This is a hack because ideally the AI scheduler would know about
disjunctive domain and schedule individual disjuncts for analysis.
However that would be a much bigger change. Let's see if the hack is
enough for now.
Reviewed By: mbouaziz
Differential Revision: D14258491
fbshipit-source-id: 21454398c
Summary:
When joining two lists of disjuncts we try to ensure there isn't a state
that under-approximates another already in the list. This helps reduce
the number of disjuncts that are generated by conditionals and loops.
Before we would always just add more disjuncts unless they were
physically equal but now we do a subgraph computation to assess
under-approximation.
We only do this half-heartedly for now however, only taking into
consideration the "new" disjuncts vs the "old" ones. It probably makes
sense to do a full quadratic search to minimise the number of disjuncts
from time to time but this isn't done here.
Reviewed By: mbouaziz
Differential Revision: D14258482
fbshipit-source-id: c2dad4889
Summary:
This removes the "abstract addresses" that used to be stored in the `Closure` attribute of pulse abstract addresses. There used to be a list of values recorded for each closure, each one representing one captured value. Instead these values are now recorded as fake edges in the memory graph.
Having addresses appear in attributes causes issues when trying to establish graph isomorphism between two memory states. Avoid it by rewriting the closures mechanism to encode captured addresses as fake edges in memory. This way captured addresses are automatically treated right by the graph algorithms (in the next diffs).
Reviewed By: mbouaziz
Differential Revision: D14323044
fbshipit-source-id: 413b4d989
Summary:
The backend has an implicit assumption that the entry block's locals
are all the function locals, not just those that happen to appear in
the entry block.
An alternative would be to collect the inverse renaming substitutions
produced by symbolic execution of jumps, which rename variables to
avoid clashes with the destination block's locals. These inverse
substitutions could be applied upon function return, in inverse
order. The complication with this approach is that it would be
necessary to distinguish between a destination local that clashes
because it has the same name as a local of some function higher in the
call stack versus a clash due to being a previous incarnation of the
same local due to a control-flow cycle. Accumulating unrenamings for
the latter case is expensive and pointless, as well as incongrous with
true interprocedural extensions.
So it seems preferable to exploit the asymmetry between the entry
block and others a bit more.
Reviewed By: jvillard
Differential Revision: D14354530
fbshipit-source-id: 815cdb224
Summary:
Interpreted subexps were not handled correctly: they must not be in
the carrier, but their non-interpreted subexps should.
Reviewed By: jvillard
Differential Revision: D14344291
fbshipit-source-id: 995896640
Summary: Unknown locations in the alias domain resulted in unexpected unreachable code.
Reviewed By: mbouaziz
Differential Revision: D14339412
fbshipit-source-id: a5dca6489
Summary:
The disjunctive domain shouldn't really be a set in the first place as
comparing abstract states for equality is expensive to do naively
(walking the whole maps representing the abstract heap). Moreover in
practice these sets have a small max size (currently 50 for pulse, the
only client), so switching them to plain lists makes sense.
Reviewed By: mbouaziz
Differential Revision: D14258489
fbshipit-source-id: c512169eb
Summary:
It's useful to keep the size of states down, especially when humans are
trying to read it. It will also help keep the size of summaries down in
the inter-procedural pulse.
Reviewed By: mbouaziz
Differential Revision: D14258486
fbshipit-source-id: 45ebcac67
Summary:
You can only take the address of variables, field accesses, and array
accesses, the rest doesn't make sense.
Reviewed By: mbouaziz
Differential Revision: D14258484
fbshipit-source-id: 8ddcfe810
Summary: Spent some time staring at empty HTML output instead of seeing `<Some ...>` because I'm dumb. Now it's dumb proof.
Reviewed By: mbouaziz
Differential Revision: D14258492
fbshipit-source-id: d1368d212
Summary:
In case the starting locations of two heap segments are
related (provably equal up to some offset), add equations between
their enclosing block to the goal. In these cases, the enclosing
blocks must be the same, so no completeness is lost. This has the
effect of instantiating existentials in the enclosing block prior to
others, which can avoid incomplete instantiation guesses.
Reviewed By: mbouaziz
Differential Revision: D14323550
fbshipit-source-id: 89a34a2c8
Summary: An initial set of basic sanity checks for frame inference.
Reviewed By: mbouaziz
Differential Revision: D14323549
fbshipit-source-id: d7cd4235f
Summary:
`('a, exn) result` is a more widely used type than with something
custom in place of `exn`, e.g. by the `Result` operations.
Reviewed By: mbouaziz
Differential Revision: D14322627
fbshipit-source-id: e2ed167ed
Summary:
- Change representation of Concat expressions from curried binary
operator to an nary one. This enables normalizing Concat expressions
with respect to associativity.
- Generalize Exp.solve to return a map rather than a pair of exps, to
enable expressing cases where solving an equation yields multiple
equations.
- Strengthen solver with implied equalities between sums of sizes of
concatenations of byte arrays.
Reviewed By: ngorogiannis
Differential Revision: D14297865
fbshipit-source-id: b40871559
Summary: After a redeclaration of a global constant, it is not parsed as ICE(integral constant expression), which results in FN.
Reviewed By: ezgicicek
Differential Revision: D14299288
fbshipit-source-id: 394afd595
Summary:
It assigns symbolic values for global variables in the load commands. However, it does not instantiate the symbols for the global variables yet, which will be addressed in another diff.
Depends on D14208643
Reviewed By: ezgicicek
Differential Revision: D14257619
fbshipit-source-id: f9113c8a3