Summary:
This fixes a memory leak false positive. When collecting unreachable
values we should be careful to take the equality relation into account.
Equal values are normally canonicalised but only with respect to "known"
equalities. This makes sure variables that are live thanks to the
"pruned" equalities are not discarded from the state.
Reviewed By: skcho
Differential Revision: D28382642
fbshipit-source-id: 2b898d754
Summary: I should have listened to skcho on D28002725 (7207e05682). Needed for the next diff.
Reviewed By: skcho
Differential Revision: D28121203
fbshipit-source-id: 96c01b141
Summary:
See added test: pulse sometimes insisted that an issue was latent even
though the condition that made it latent could not be influenced (hence
could the issue could never become manifest) by callers because it was
unrelated to the pre, i.e. it came from a mutation inside the function.
In these cases, we want to report the issue straight away instead of
keeping it latent.
Reviewed By: skcho
Differential Revision: D28002725
fbshipit-source-id: ce9e6f190
Summary:
Whenever an equality "t = v" (t an arbitrary term, v a variable) is
added (or "v = t"), remember the "t -> v" mapping after canonicalising t
and v. Use this to detect when two variables are equal to the same term:
`t = v` and `t = v'` now yields `v = v'` to be added to the equality
relation of variables. This increases the precision of the arithmetic
engine.
Interestingly, the impact on most code I've tried is:
1. mostly same perfs as before, if a bit slower (could be within noise)
2. slightly more (latent) bugs reported in absolute numbers
I would have expected it to be more expensive and yield fewer bugs (as
fewer false positives), but there could be second-order effects at play
here where we get more coverage. We definitely get more latent issues
due to dereferencing pointers after testing nullness, as can be seen in
the unit tests as well, which may alone explain (2).
There's some complexity when adding term equalities where the term
is linear, as we also need to add it to `linear_eqs` but `term_eqs` and
`linear_eqs` are interested in slightly different normal forms.
Reviewed By: skcho
Differential Revision: D27331336
fbshipit-source-id: 7314e127a
Summary:
It's better (=possibly more efficient) to take the opportunity to
normalize linear terms when we can instead of possibly having to apply
the same normalization over and over on individual terms until the next
round of proper normalization.
Reviewed By: skcho
Differential Revision: D27464885
fbshipit-source-id: 0dc01a089
Summary:
When we don't know the value being shifted it may help to translate
bit-shifting into multiplication by a constant as it might surface
linear terms, eg `x<<1` is `2*x`.
Reviewed By: skcho
Differential Revision: D27464847
fbshipit-source-id: 9b3b5f0d0
Summary:
The simplifications done by `simplify_shallow` are all taken care of by
`eval_const_shallow` as well, they just also happen to help when not
*all* of the term is a constant. However, they might be less
precise/efficient than in the constant case, in particular in the next
diff that translates `x << c` into `x * 2^c` when `c` is constant.
Reviewed By: skcho
Differential Revision: D27464805
fbshipit-source-id: 452bc6ab1
Summary:
On some pathological examples of crypto primitives like libsodium, later
diffs make pulse grind to a halt due to an explosion in the size of
literals. This is at least partly due to the fact the arithmetic doesn't
operate modulo 2^64.
Due to the fact the arithmetic is confused in any case when we reach
such large numbers, cap them, currently at 2^128. This removes pathological
cases for now, even now on libsodium Pulse is ~5 times faster than before!
Take this opportunity to put the modified Q/Z modules in the own files.
Reviewed By: jberdine
Differential Revision: D27463933
fbshipit-source-id: 342d941e2
Summary: Just some scaffolding to save a bit of churn from the next diff.
Reviewed By: skcho
Differential Revision: D27328348
fbshipit-source-id: 4f5bfcc65
Summary:
Fixing `IsInstanceOf` term simplification for null case. Before, this
was only being done if value was known to be null at the moment of the
call to `instanceof`. Otherwise, the `IsInstanceOf` term would remain in
the formula unnecessarily.
Reviewed By: jvillard
Differential Revision: D27361025
fbshipit-source-id: 2d958a757
Summary:
Before this diff, TOPL had 3 implementations:
1. a post-processing of biabduction summaries
2. a post-processing of pulse summaries
3. a deep embedding in pulse
1 and 2 additionally require instrumenting SIL to generate monitors for
the TOPL properties. 3 is faster than both 1 and 2, by a good lot, and
doesn't require instrumenting the SIL code. Thus, delete 1 and 2!
Also harmonise the CLI so that TOPL is activated by --topl, which
actives it as a checker, like other analyses.
Reviewed By: rgrig
Differential Revision: D27270178
fbshipit-source-id: e86cf972b
Summary:
There could still be divisions by zero, eg in the "mod" case: consider
"x mod (1/2)" (doesn't matter what x is). Then we'd check "1/2 =? 0" and
since it's false conclude that it's safe to take the modulo... oops!
To make things safer, harden `Z` to not throw anymore.
Also add a layer of defense in depth by wrapping the functions that do
Z/Q operations in another layer of exception catching because we really
don't want to crash the entire analysis due to that.
Reviewed By: martintrojer
Differential Revision: D27262569
fbshipit-source-id: e22187ca0
Summary:
Previously we would only simplify when the term is exactly IsInstanceOf,
and skip sub-terms. Most of the time this is the case but in the future
this could change.
Reviewed By: skcho
Differential Revision: D27156519
fbshipit-source-id: bd10574e0
Summary:
10 seems better at no visible CPU cost. Not very scientific as this is
only one data point, but neither was choosing 5 in the first place.
Measurements on OpenSSL using Pulse.ISL:
```
$ time infer --pulse-only --scheduler callgraph -j 2 --pulse-report-latent-issues --pulse-isl
| fuel | user time (s) | under-normalisation | latent issues |
|------+---------------+---------------------+---------------|
| 5 | 163 | 3074 | 160 |
| 10 | 158 | 85 | 160 |
| 15 | 174 | 32 | 160 |
| 20 | 186 | 20 | 160 |
```
Reviewed By: skcho
Differential Revision: D27156497
fbshipit-source-id: 1114b8677
Summary:
This is a refactoring for a later change. This change alters behaviour
slightly to make it less chaotic: instead of normalization doing:
"""
do normalize(phi) until phi doesn't change anymore
normalize(phi):
do normalize_linear_part(phi) until this doesn't change phi anymore
do other normalizations
"""
we now do
"""
do normalize(phi) until phi doesn't change anymore
normalize(phi):
normalize_linear_part(phi)
do other normalizations if linear didn't change
"""
In particular we no longer spend potentially-quadratic amouns of fuel
during normalization.
Reviewed By: skcho
Differential Revision: D26450391
fbshipit-source-id: 9f63e1a04
Summary:
- add a pp_new_eq function to help people who want to printf-debug stuff
- fix one case where new_eqs were reset to `[]` instead of propagated
- do not add to `new_eqs` when nothing changes during normalisation.
This avoids duplicated new_eqs that arise from regenerating the linear
equality relation multiple times during normalisation.
Reviewed By: da319
Differential Revision: D27156042
fbshipit-source-id: 59b093ec8
Summary: Adding support for the Java instanceof operator in Pulse.
Reviewed By: jvillard
Differential Revision: D26275046
fbshipit-source-id: 8ba608cca
Summary:
Difficult to repro as most of the time other simplifactions catch this
before we actually get to dividing by zero. Nonetheless...
shamecube
Reviewed By: da319
Differential Revision: D26758187
fbshipit-source-id: b8718c515
Summary:
As a first step to support the Java `instanceof`
operator, this change allows the path condition to be appended with
`IsInstanceOf(var, typ)`.
Reviewed By: jvillard
Differential Revision: D26664009
fbshipit-source-id: cd19dce83
Summary:
Using more than the "known" part of the arithmetic could accidentally
leak "pruned" information into certain facts.
I noticed this when adding more term equality reasoning to pulse in
another diff. At the moment this has little effect but is still more
correct conceptually.
Reviewed By: ezgicicek
Differential Revision: D26450333
fbshipit-source-id: eb31da344
Summary:
This is all dead code but I had to do this to try something else and I
don't want to have to do that again :)
Reviewed By: skcho
Differential Revision: D26022111
fbshipit-source-id: 622ca10b9
Summary:
It is better for the derived comparison functions to start by comparing
the single offset `Q.t` instead of the map. The order of the pair
doesn't matter so the easiest way to achieve that is by putting the
offset first.
Reviewed By: skcho
Differential Revision: D26022080
fbshipit-source-id: 874ea5c66
Summary:
First stab at quantifier elimination done poorly but fast :)
Easy one: when we know "x = y", and we want to keep x but not y, then
replace y by x everywhere.
Reviewed By: skcho
Differential Revision: D25432207
fbshipit-source-id: 81b142b96
Summary:
When we know `v1 = v2`, canonicalize `v2 -> v1 * v3 -> v2` to
`v1 -> v1 * v3 -> v1`. Only do this when creating summaries
(and so also when reporting errors) for now.
This only takes into account the equality relation between variables
for now. It needs to be extended to take into account other ways
variables can be equal, eg when two variables are equal to the same
constant or the same term.
Reviewed By: skcho
Differential Revision: D25092158
fbshipit-source-id: 9e589b631
Summary:
Use the new module to represent both Sat/Unsat from Pulse formulas, and
FeasiblePath/InfeasiblePath from PulseReport.
Reviewed By: jberdine
Differential Revision: D25277566
fbshipit-source-id: 9f8412ca9
Summary:
A Topl "small step" is a call to a method that is of interest to the
automaton. When such a call of interest is made, the topl component of
PulseAbductiveDomain.t is updated. This means that intra-procedural
Topl should now work entirely inside Pulse, without instrumenting Sil.
Main TODOs:
- add error extraction
- implement inter-procedural (PulseTopl.large_step)
Reviewed By: jvillard
Differential Revision: D25028286
fbshipit-source-id: e31a96d13
Summary:
Communicate new facts from the arithmetic domain to the memory domain to
detect contradictions between the two.
Reviewed By: jberdine
Differential Revision: D24832079
fbshipit-source-id: 2caf8e9af
Summary:
Take another page from the Incorrectness Logic book and refrain from reporting issues on paths unless we know for sure that this path will be taken.
Previously, we would report on paths that are merely *not impossible*. This goes very far in the other direction, so it's possible we'll want to go back to some sort of middle ground. Or maybe not. See the changes in the tests to get a sense of what we're missing.
Reviewed By: ezgicicek
Differential Revision: D24014719
fbshipit-source-id: d451faf02
Summary:
This would previously print that we ran out of fuel even if we didn't
and we simply reached a normal form.
Reviewed By: ezgicicek
Differential Revision: D23575571
fbshipit-source-id: 37d02ca8d
Summary:
Report errors found by running Topl on top of Pulse, when using
--topl-pulse. Topl tests now run on top of Pulse.
Reviewed By: jvillard
Differential Revision: D23030771
fbshipit-source-id: 8770c2902
Summary:
When normalizing discovers new linear arithmetic facts in
`normalize_linear_eqs` we go around once more. Do the same when atoms
become linear equalities.
Reviewed By: skcho
Differential Revision: D23264425
fbshipit-source-id: b355875f3
Summary:
Mostly cosmetic except for a change in [solve_eq] to try harder at
normalization (improves unit tests!). Add more comments and do minor
renamings.
Reviewed By: skcho
Differential Revision: D23243629
fbshipit-source-id: 55bdaf8a8
Summary:
This function had become a bit hard to read and the part about embedded
atoms was not very clear and also a bit incomplete (need to handle "= 1"
and "≠ 1" too).
Reviewed By: skcho
Differential Revision: D23242216
fbshipit-source-id: 239fade97
Summary:
This does a bunch of things at once (sorry):
- Refactor atom/term normalisation so that terms that are really just
atoms become atoms.
- Use this to not bother adding special cases in the functions exported
in the .mli: `and_less_than`, `and_equal_binop`, `prune_binop`, etc.
all had special cases to avoid introducing terms that could be atoms.
That's not great because the same smarts wasn't applied to terms that
would only become atom-like after some normalisation, and led to weird
and duplicated code. Now it's much cleaner: just add the most
straighforward fact and normalise!
- Fix a bug: adding a new equality `x = linear` should *not* be done
using `Normalizer.merge_var_linarith` as this is an internal function
that assumes that `x` is the right representative in `x - linear`.
Instead, for abitrary equations of that form, `solve_eq` should be used.
- When `normalize_linear_eqs` discovers new linear equalities, normalize
again. Add fuel there too to avoid spending too much time doing that.
It could be that we don't need/want fuel there but then we'd need to
think very hard about why there's no infinite recursion possible and
that seems harder.
Reviewed By: skcho
Differential Revision: D23241282
fbshipit-source-id: e5b8c4759
Summary:
This is used for variable substitution and will often be a no-op when
normalising terms over and over again (after the first normalisation,
the expression should stay the same). The equivalent function for terms
was already being careful about not re-allocating identical terms so
extend that care to linear expression.
Reviewed By: skcho
Differential Revision: D23241601
fbshipit-source-id: b365eb87a
Summary:
This allows further normalisation now that terms contain linear
expressions in normal form.
Reviewed By: skcho
Differential Revision: D23241499
fbshipit-source-id: f8e4e759c
Summary:
Linear arithmetic is able to simplify more atoms, eg `x+y <= x+y`
becomes `True` by normalising to "lhs - rhs <= 0". This does the first
step of normalisation, but to get True in this example we also need to
substitute inside atoms according to the linear equalities, which is the
next diff (for now we only substitute variables inside atoms for other
variables or for constants).
Reviewed By: skcho
Differential Revision: D23241457
fbshipit-source-id: 0da0b545c
Summary:
More scaffolding, nothing creates `Linear _` terms yet. Some changes to
variables substitution to allow substituting variables for linear terms
(as well as constants and other variables).
Reviewed By: skcho
Differential Revision: D23241461
fbshipit-source-id: fc870255e
Summary:
This is needed for the rest of the stack that introduces a `Linear of
LinArith.t` variant in `Term.t` to enable more normalisation inside of
terms.
Reviewed By: skcho
Differential Revision: D23241353
fbshipit-source-id: ad765cd13
Summary:
Make term simplification a bit more structured and separate the
"simplification" phase from the "evaluating constant expressions" phase.
Also implement the latter for all possible terms.
Reviewed By: skcho
Differential Revision: D23241334
fbshipit-source-id: 2964aa477
Summary: Not much to see here, extracted to make further changes more readable.
Reviewed By: da319
Differential Revision: D23241335
fbshipit-source-id: 81181f23a
Summary:
Since this is where almost all of the reasoning is concentrated, let's
make sure we use it at every opportunity!
Reviewed By: skcho
Differential Revision: D23194224
fbshipit-source-id: fedb2811e
Summary:
Reset the state before each test so that adding tests doesn't affect
other tests by shifting the ids of their anonymous variables.
Reviewed By: skcho
Differential Revision: D23194171
fbshipit-source-id: 7b717f160
Summary:
These are the only ones we need, it turns out the other types (string,
proc names, ...) were dead code. The changes the integer constants to
rational constants, to match the domain of the linear arithmetic engine.
Reviewed By: skcho
Differential Revision: D23164136
fbshipit-source-id: 755c3f526
Summary:
Instead of alternating between a normal form and a tree structure,
always keep a normal form. Except the normal form is not always fully
normalized. Overall, it's a bit faster than the previous iteration,
while being more precise! In particular, linear arithmetic aims at being
much more complete.
Reviewed By: skcho
Differential Revision: D23134209
fbshipit-source-id: 5f9ec6ece
Summary:
At the end of analysing a procedure we call `simplify
~keep:vars_live_in_pre_post`. Any variable not in
`vars_live_in_pre_post` is not mentioned anywhere else in the state and
therefore is not going to contribute constraints in callers of the
procedure (in other words: they're dead). We want to also forget
arithmetic facts about these variables as this is a good opportunity to
make the path condition smaller, sometimes by a lot!
The main issue is that dead variables may be useful intermediate terms
in the formula, eg trying to keep only facts about `x` in `y = x + 1 &&
y = 0` is going to lose a lot of precision. But, if a variable not in
`keep` is only mentioned in a simple atom `z = 42` atom, for example,
it's safe to forget about it, eg it's safe to remember only `x=0` in
`x=0 && z=42` (if only `x` is live).
In other words, we can get rid of all atoms containing variables not
transitively involved in other atoms that eventually involve live
variables. A graph problem! This is guaranteed not to forget anything
important and can still trim a lot of atoms in certain situations.
Reviewed By: skcho
Differential Revision: D22921313
fbshipit-source-id: 6d5db7cbe