Summary:
While adding a footprint frame during rearrangement, the footprint
variables should be fresh with respect to the current state too, not
only with respect to he footprint, because the frame is added to the
state.
Reviewed By: jberdine
Differential Revision: D14401026
fbshipit-source-id: 20ea4485a
Summary:
Context: "quandary" traces optimise for space by only storing a call site (plus analysis element) in a summary, as opposed to a list of call sites plus the element (i.e., a trace). When forming a report, the trace is expanded to a full one by reading the summary of the called function, and then matching up the current element with one from the summary, iterating until the trace cannot be expanded any more. In the best case, this can give a quadratic saving, as a real trace gets longer the higher one goes in the call stack, and therefore the total cost of saving that trace in each summary is quadratic in the length of the trace. Quandary traces give a linear cost.
HOWEVER, these have been a source of many subtle bugs.
1. The trace expansion strategy is very arbitrary and cannot distinguish between expanded traces that are invalid (i.e., end with a call and not an originating point, such as a field access in RacerD). Plus the strategy does not explore all expansions, just the left-most one, meaning the left most may be invalid in the above sense, but another (not left-most) isn't even though it's not discovered by the expansion. This is fixable with major surgery.
2. All real traces that lead to the same endpoint are conflated -- this is to save space because there may be exponentially many such traces. That's OK, but these traces may have different locking contexts -- one may take the lock along the way, and another may not. The expansion cannot make sure that if we are reporting a trace we have recorded as taking the lock, will actually do so. This has resulted in very confusing race reports that are superficially false positives (even though they point to the existence of a real race).
3. Expansion completely breaks down in the java/buck integration when the trace goes through f -> g -> h and f,g,h are all in distinct buck targets F,G,H and F does not depend directly on H. In that case, the summary of h is simply not available when reporting/expanding in f, so the expanded trace comes out as truncated and invalid. These are filtered out, but the filtering is buggy and kills real races too.
This diff completely replaces quandary traces in RacerD with plain explicit traces.
- This will incur the quadratic space/time cost previously saved. See test plan: there is indeed a 30% increase in summary size, but there is no slowdown. In fact, on openssl there is a 10-20% perf increase.
- For each endpoint, up to a single trace is used, as before, so no exponential explosion. However, because there is no such thing as expansion, we cannot get it wrong and change the locking context of a trace.
- This diff is emulating the previous reporting format as much as possible to allow good signal from the CI. Further diffs up this stack will remove quandary-trace specific things, and simplify further the code.
- 2 is not fully addressed -- it will require pushing the `AccessSnapshot` structure inside `TraceElem`. Further diffs.
Reviewed By: jberdine
Differential Revision: D14405600
fbshipit-source-id: d239117aa
Summary:
Also minor change to make strlen implementation syntactically closer
to spec.
Reviewed By: ngorogiannis
Differential Revision: D14403647
fbshipit-source-id: 48c771329
Summary:
This diff adds support in symbolic execution for calls to intrinsic
functions, to be used in lieu of adding a separate Llair instruction
for each intrinsic. This involves:
- adding skeleton support in Exec for symbolically execution an
intrinsic function call;
- exposing this in Domain;
- allowing symbolic execution of block terminators (e.g. function
call) to possibly fail; and
- generalizing Report for failing terminators.
Reviewed By: ngorogiannis
Differential Revision: D14403652
fbshipit-source-id: d86d9d1b8
Summary:
This diff changes a LatestPrune to use a return variable instead of another local variable, when the function returns a conditional value. This is a preparation to propagate LatestPrune inter-procedurally in the following diffs.
context: If a function returns a conditional value, e.g. `return x == y`, the LatestPrune value includes a temporary local variable introduced by the SIL translation. This diff is to avoid propagating the temporary local variables to its caller.
Reviewed By: mbouaziz
Differential Revision: D14321534
fbshipit-source-id: d157bfdd0
Summary:
To meet the pure parts of formulas, the process was to (a) call Rename.extend
with variables occuring in similar places and (b) extract substitutions out of
those. Two matching primed vars would both be replaced by some fresh primed var.
However, equivalence classes of primed variables would *not* be replaced by
one fresh (primed) variable. Now, that should work.
Reviewed By: mbouaziz
Differential Revision: D14150192
fbshipit-source-id: 90ca9216c
Summary:
Some functions are translated directly to an expression, rather than a
function call. The llvm instructions for calling functions that may
and may not raise exceptions differ (Invoke and Call, resp.). This
diff adds support for the Invoke case.
Reviewed By: jvillard
Differential Revision: D14385596
fbshipit-source-id: 24458d960
Summary:
These tell llvm that contents of memory will not change during
execution of some code, which the analyzer does not need.
Reviewed By: jvillard
Differential Revision: D14385597
fbshipit-source-id: 0ef566a6f
Summary:
Make could get confused and use both the $(MODEL_DIR)/cxxabi.bc and
%.bc rules, leading to build failure.
Reviewed By: jvillard
Differential Revision: D14385600
fbshipit-source-id: 05f0ac6e1
Summary:
This will be used in the future to determine what to do with destructors
in pulse.
Reviewed By: mbouaziz
Differential Revision: D14324759
fbshipit-source-id: bc3c34471
Summary:
This seems generally useful. Force people to do it in the future even if
they want to avoid having to update the frontend tests.
Reviewed By: mbouaziz
Differential Revision: D14324758
fbshipit-source-id: cdef3f72a
Summary:
Before: the abstract state represents heap addresses as a single map
from addresses to edges + attributes.
After: the heap is made of 2 maps: one mapping addresses to edges, and
one mapping an address to its attributes.
It turns out that edges and attributes are often not updated at the same
time, so keeping them in the same map was causing pressure on the OCaml
gc.
Reviewed By: mbouaziz
Differential Revision: D14147991
fbshipit-source-id: 6713eeb3c
Summary:
This is basically unused except for debugging and is going to cause
issues later.
Reviewed By: mbouaziz
Differential Revision: D14258490
fbshipit-source-id: b2800990e
Summary:
This fixes (if in a hackish way) an inherently quadratic behaviour in
the disjunctive domain when analysing loops: If you start with some
disjuncts `D1 \/ ... \/ Dn` and go once around the loop, you will end up
with disjuncts `(D1 \/ ... \/ Dn) \/ (D1' \/ ... \/ Dn')` assuming that
for all `i`, `{ Di } body of loop { Di' }` (in practice there is the
added difficulty that the post of the body of the loop can be a
disjunction too instead of a single abstract state). Assuming this isn't
a fixpoint, we would then go around the loop again from `D1`, ..., `Dn`,
`D1'`, ..., `Dn'`. However we already know what the posts of `D1` to `Dn`
are!
This attempts to curb duplicate work by marking the disjuncts in `prev`
as "visited" and instructing symbolic execution to skip visited states.
Then, once convergence is detected (from within `widen` for now) we mark
again all states as unvisited so that whatever is after the loop gets
symbolically executed.
This is a hack because ideally the AI scheduler would know about
disjunctive domain and schedule individual disjuncts for analysis.
However that would be a much bigger change. Let's see if the hack is
enough for now.
Reviewed By: mbouaziz
Differential Revision: D14258491
fbshipit-source-id: 21454398c
Summary:
When joining two lists of disjuncts we try to ensure there isn't a state
that under-approximates another already in the list. This helps reduce
the number of disjuncts that are generated by conditionals and loops.
Before we would always just add more disjuncts unless they were
physically equal but now we do a subgraph computation to assess
under-approximation.
We only do this half-heartedly for now however, only taking into
consideration the "new" disjuncts vs the "old" ones. It probably makes
sense to do a full quadratic search to minimise the number of disjuncts
from time to time but this isn't done here.
Reviewed By: mbouaziz
Differential Revision: D14258482
fbshipit-source-id: c2dad4889
Summary:
This removes the "abstract addresses" that used to be stored in the `Closure` attribute of pulse abstract addresses. There used to be a list of values recorded for each closure, each one representing one captured value. Instead these values are now recorded as fake edges in the memory graph.
Having addresses appear in attributes causes issues when trying to establish graph isomorphism between two memory states. Avoid it by rewriting the closures mechanism to encode captured addresses as fake edges in memory. This way captured addresses are automatically treated right by the graph algorithms (in the next diffs).
Reviewed By: mbouaziz
Differential Revision: D14323044
fbshipit-source-id: 413b4d989
Summary:
The backend has an implicit assumption that the entry block's locals
are all the function locals, not just those that happen to appear in
the entry block.
An alternative would be to collect the inverse renaming substitutions
produced by symbolic execution of jumps, which rename variables to
avoid clashes with the destination block's locals. These inverse
substitutions could be applied upon function return, in inverse
order. The complication with this approach is that it would be
necessary to distinguish between a destination local that clashes
because it has the same name as a local of some function higher in the
call stack versus a clash due to being a previous incarnation of the
same local due to a control-flow cycle. Accumulating unrenamings for
the latter case is expensive and pointless, as well as incongrous with
true interprocedural extensions.
So it seems preferable to exploit the asymmetry between the entry
block and others a bit more.
Reviewed By: jvillard
Differential Revision: D14354530
fbshipit-source-id: 815cdb224
Summary:
Interpreted subexps were not handled correctly: they must not be in
the carrier, but their non-interpreted subexps should.
Reviewed By: jvillard
Differential Revision: D14344291
fbshipit-source-id: 995896640