Summary: We recently introduced a more precise model for constructing an optional from a value by making a shallow copy. However, this introduced Use After Delete false positives. For now, we go back to a less precise model by creating a fresh value. A proper model would be to either make a deep copy or call the copy constructor for a value. We will address this in the following diff.
Reviewed By: jvillard
Differential Revision: D24826749
fbshipit-source-id: 3e5e4edeb
Summary: Refactor `folly::Optional` models to make them easier to reuse for `std::optional`
Reviewed By: jvillard
Differential Revision: D24760053
fbshipit-source-id: f665e84c8
Summary: `folly::Optional::value()` returns a reference, hence an error was shown when the actual value was being accessed. Since `value()` throws an exception in case of `folly::none`, we want to show the error message at the call site of `value()`. We do this by dereferencing the result of `value()` in the model.
Reviewed By: jvillard
Differential Revision: D24702875
fbshipit-source-id: ca9f30349
Summary: Cleanup `Typ` by moving all constant types to `StdTyp`. Also remove `Typ.typ` as it's just `Typ.t` now.
Reviewed By: jberdine
Differential Revision: D24620397
fbshipit-source-id: 4764f87ef
Summary:
Before we were creating a fresh internal value when we were constructing `folly::Optional`. This diff models `folly::Optional` constructor more precisely by copying the given value.
There was also a missing dereference in the model of `value_or`
Reviewed By: jvillard
Differential Revision: D24621016
fbshipit-source-id: c86d3c157
Summary: Model `folly::Optional::value_or(default)` to return value if not-empty and `default` if empty.
Reviewed By: jvillard
Differential Revision: D24539456
fbshipit-source-id: cc9e176cc
Summary:
Emit the crucial parts of Pulse summaries as json to enable
post-processing by external tools. Stop somewhat arbitrarily at some
datatypes that are just emitted as "opaque" values.
For example:
```
$ infer debug --procedures --procedures-summary-json --select 0
[[["pulse",[["ContinueProgram",{"post":{"heap":[["v3",[[["Dereference"],["v4","_"]]]],["v7",[[["Dereference"],["v3","_"]]]]],"stack":[[["ProgramVar",{"plain":"return","mangled":null}],["v7","_"]]],"attrs":"_"},"pre":{"heap":[],"stack":[],"attrs":"_"},"skipped_calls":"_","path_condition":"_"}],["ContinueProgram",{"post":{"heap":[["v3",[[["Dereference"],["v4","_"]]]],["v8",[[["Dereference"],["v3","_"]]]]],"stack":[[["ProgramVar",{"plain":"return","mangled":null}],["v8","_"]]],"attrs":"_"},"pre":{"heap":[],"stack":[],"attrs":"_"},"skipped_calls":"_","path_condition":"_"}]]]]]
```
Reviewed By: ezgicicek
Differential Revision: D24503387
fbshipit-source-id: 9bd08e93b
Summary:
Before this diff we would just propagate the callee abstract state,
which doesn't make sense in the caller. We could just remove the state
from AbortProgram altogether as Pulse itself doesn't use it, but for now
let's at least make sure it's accurate.
Also needed for upcoming hackathon that will start from Pulse error
specs to try to produce tests :)
Reviewed By: ezgicicek
Differential Revision: D24448073
fbshipit-source-id: 9100b3f79
Summary:
This makes sure we call `AbductiveDomain.summary_of_post` exactly once
per post-condition. Notice in particular in the diff:
- in Pulse.ml we remove a now-certified-useless "is_unsat_expensive"
call
- in PulseOperations.ml we add a previously-missing call to
`summary_of_post` (it's needed to remove local variables from the
symbolic state + normalize)
The price to pay is ugly type annotations and down-casting peppered in a
few places, in reasonable number.
Reviewed By: da319
Differential Revision: D24078564
fbshipit-source-id: 3102cacf0
Summary:
Take another page from the Incorrectness Logic book and refrain from reporting issues on paths unless we know for sure that this path will be taken.
Previously, we would report on paths that are merely *not impossible*. This goes very far in the other direction, so it's possible we'll want to go back to some sort of middle ground. Or maybe not. See the changes in the tests to get a sense of what we're missing.
Reviewed By: ezgicicek
Differential Revision: D24014719
fbshipit-source-id: d451faf02
Summary:
We were missing assignment to captured variables with initializers.
Consider the following example:
```
S* update_inside_lambda_capture_and_init(S* s) {
S* object = nullptr;
auto f = [& o = object](S* s) { o = s; };
f(s);
return object;
}
```
which was translated to
```
VARIABLE_DECLARED(o:S*&);
*&o:S*&=&object
*&f =(_fun...lambda..._operator(),([by ref]&o &o:S*&))
```
However, we want to capture `o` (which is an address of `object`), rather `&o` in closure.
After the diff
```
VARIABLE_DECLARED(o:S*&);
*&o:S*&=&object
n$7=*&o:S*&
*&f =(_fun...lambda..._operator(),([by ref]n$7 &o:S*&))
```
Reviewed By: jvillard
Differential Revision: D23567346
fbshipit-source-id: 20f77acc2
Summary:
Even though it's unused in the implementation this argument is supposed
to be `Typ.t` so spell it out.
Reviewed By: skcho
Differential Revision: D23729433
fbshipit-source-id: d05548f42
Summary:
This can be useful to make pulse forget about tricky parts of the code.
Treat "skipped" procedures as unknown so heuristics for mutating the
return value and parameters passed by reference are applied.
Reviewed By: ezgicicek
Differential Revision: D23729410
fbshipit-source-id: d7a4924a8
Summary:
This would previously print that we ran out of fuel even if we didn't
and we simply reached a normal form.
Reviewed By: ezgicicek
Differential Revision: D23575571
fbshipit-source-id: 37d02ca8d
Summary: Most of the time, when the procdesc of a callee is requested, all that is really required is the procedure attributes. However, requesting the procdesc may return `None` when the procedure is undefined (in Java, and soon for Clang too). So, change all callsites to using attributes instead, where possible.
Reviewed By: jvillard
Differential Revision: D23539422
fbshipit-source-id: 3b1a52d48
Summary: Added a model for copy constructor for `std::function`. In most cases, the SIL instruction `std::function::function(&dest, &src)` gives us pointers to `dest` and `src`, hence, we model the copy constructor as a shallow copy. However, in some cases, e.g. `std::function f = lambda_literal`, SIL instruction contains the closure itself `std::function::function(&dest, (operator(), captured_vars)`, hence, we need to make sure we copy the right value.
Reviewed By: ezgicicek
Differential Revision: D23396568
fbshipit-source-id: 0acb8f6bc
Summary: There was a mismatch between formals and actuals in `std::function::operator()` because we were not passing the first argument corresponding to the closure.
Reviewed By: ezgicicek
Differential Revision: D23372104
fbshipit-source-id: d0f9b27d6
Summary: When we evaluate lambdas in pulse, we create a closure object with `fake` fields to store captured variables. However, during the function call we were not linking the captured values from the closure object. We address this missing part here.
Reviewed By: jvillard
Differential Revision: D23316750
fbshipit-source-id: 14751aa58
Summary:
Report errors found by running Topl on top of Pulse, when using
--topl-pulse. Topl tests now run on top of Pulse.
Reviewed By: jvillard
Differential Revision: D23030771
fbshipit-source-id: 8770c2902
Summary:
When normalizing discovers new linear arithmetic facts in
`normalize_linear_eqs` we go around once more. Do the same when atoms
become linear equalities.
Reviewed By: skcho
Differential Revision: D23264425
fbshipit-source-id: b355875f3
Summary:
Mostly cosmetic except for a change in [solve_eq] to try harder at
normalization (improves unit tests!). Add more comments and do minor
renamings.
Reviewed By: skcho
Differential Revision: D23243629
fbshipit-source-id: 55bdaf8a8
Summary:
This function had become a bit hard to read and the part about embedded
atoms was not very clear and also a bit incomplete (need to handle "= 1"
and "≠ 1" too).
Reviewed By: skcho
Differential Revision: D23242216
fbshipit-source-id: 239fade97
Summary:
This does a bunch of things at once (sorry):
- Refactor atom/term normalisation so that terms that are really just
atoms become atoms.
- Use this to not bother adding special cases in the functions exported
in the .mli: `and_less_than`, `and_equal_binop`, `prune_binop`, etc.
all had special cases to avoid introducing terms that could be atoms.
That's not great because the same smarts wasn't applied to terms that
would only become atom-like after some normalisation, and led to weird
and duplicated code. Now it's much cleaner: just add the most
straighforward fact and normalise!
- Fix a bug: adding a new equality `x = linear` should *not* be done
using `Normalizer.merge_var_linarith` as this is an internal function
that assumes that `x` is the right representative in `x - linear`.
Instead, for abitrary equations of that form, `solve_eq` should be used.
- When `normalize_linear_eqs` discovers new linear equalities, normalize
again. Add fuel there too to avoid spending too much time doing that.
It could be that we don't need/want fuel there but then we'd need to
think very hard about why there's no infinite recursion possible and
that seems harder.
Reviewed By: skcho
Differential Revision: D23241282
fbshipit-source-id: e5b8c4759
Summary:
This is used for variable substitution and will often be a no-op when
normalising terms over and over again (after the first normalisation,
the expression should stay the same). The equivalent function for terms
was already being careful about not re-allocating identical terms so
extend that care to linear expression.
Reviewed By: skcho
Differential Revision: D23241601
fbshipit-source-id: b365eb87a
Summary:
This allows further normalisation now that terms contain linear
expressions in normal form.
Reviewed By: skcho
Differential Revision: D23241499
fbshipit-source-id: f8e4e759c
Summary:
Linear arithmetic is able to simplify more atoms, eg `x+y <= x+y`
becomes `True` by normalising to "lhs - rhs <= 0". This does the first
step of normalisation, but to get True in this example we also need to
substitute inside atoms according to the linear equalities, which is the
next diff (for now we only substitute variables inside atoms for other
variables or for constants).
Reviewed By: skcho
Differential Revision: D23241457
fbshipit-source-id: 0da0b545c
Summary:
More scaffolding, nothing creates `Linear _` terms yet. Some changes to
variables substitution to allow substituting variables for linear terms
(as well as constants and other variables).
Reviewed By: skcho
Differential Revision: D23241461
fbshipit-source-id: fc870255e
Summary:
This is needed for the rest of the stack that introduces a `Linear of
LinArith.t` variant in `Term.t` to enable more normalisation inside of
terms.
Reviewed By: skcho
Differential Revision: D23241353
fbshipit-source-id: ad765cd13
Summary:
Make term simplification a bit more structured and separate the
"simplification" phase from the "evaluating constant expressions" phase.
Also implement the latter for all possible terms.
Reviewed By: skcho
Differential Revision: D23241334
fbshipit-source-id: 2964aa477
Summary: Not much to see here, extracted to make further changes more readable.
Reviewed By: da319
Differential Revision: D23241335
fbshipit-source-id: 81181f23a
Summary:
`delete` works exactly like `free` so merge both models together. Also
move the `free(0)` test to nullptr.cpp as it seems more appropriate.
Reviewed By: da319
Differential Revision: D23241297
fbshipit-source-id: 20a32ac54
Summary:
Since this is where almost all of the reasoning is concentrated, let's
make sure we use it at every opportunity!
Reviewed By: skcho
Differential Revision: D23194224
fbshipit-source-id: fedb2811e
Summary:
Reset the state before each test so that adding tests doesn't affect
other tests by shifting the ids of their anonymous variables.
Reviewed By: skcho
Differential Revision: D23194171
fbshipit-source-id: 7b717f160
Summary:
These are the only ones we need, it turns out the other types (string,
proc names, ...) were dead code. The changes the integer constants to
rational constants, to match the domain of the linear arithmetic engine.
Reviewed By: skcho
Differential Revision: D23164136
fbshipit-source-id: 755c3f526
Summary:
Instead of alternating between a normal form and a tree structure,
always keep a normal form. Except the normal form is not always fully
normalized. Overall, it's a bit faster than the previous iteration,
while being more precise! In particular, linear arithmetic aims at being
much more complete.
Reviewed By: skcho
Differential Revision: D23134209
fbshipit-source-id: 5f9ec6ece
Summary: Before we were modelling `vector.end()` as returning a fresh pointer every time is was called. It is common to check if an iterator is not the `end()` iterator and proceed to dereference the iterator in that case. In such code pattern `vector.end()` is called twice and returns different fresh values which causes false positives. To fix this, we add a special internal field `__infer_model_backing_array_pointer_to_last_element` to a vector to denote its end. Now, every time we call `vector.end()` we return the value of this field. We introduce a new attribute `EndOfCollection` to mark `end` iterator as the existing `EndIterator` invalidation is not suitable when we need to read the same value multiple times.
Reviewed By: jvillard
Differential Revision: D23101185
fbshipit-source-id: fa8a33b58
Summary:
At the end of analysing a procedure we call `simplify
~keep:vars_live_in_pre_post`. Any variable not in
`vars_live_in_pre_post` is not mentioned anywhere else in the state and
therefore is not going to contribute constraints in callers of the
procedure (in other words: they're dead). We want to also forget
arithmetic facts about these variables as this is a good opportunity to
make the path condition smaller, sometimes by a lot!
The main issue is that dead variables may be useful intermediate terms
in the formula, eg trying to keep only facts about `x` in `y = x + 1 &&
y = 0` is going to lose a lot of precision. But, if a variable not in
`keep` is only mentioned in a simple atom `z = 42` atom, for example,
it's safe to forget about it, eg it's safe to remember only `x=0` in
`x=0 && z=42` (if only `x` is live).
In other words, we can get rid of all atoms containing variables not
transitively involved in other atoms that eventually involve live
variables. A graph problem! This is guaranteed not to forget anything
important and can still trim a lot of atoms in certain situations.
Reviewed By: skcho
Differential Revision: D22921313
fbshipit-source-id: 6d5db7cbe
Summary:
Perhaps a bit overkill to introduce all this extra complexity but it
makes the unit tests much more readable. In fact, this uncovered a bug
in the dead variable elimination!
Reviewed By: dulmarod
Differential Revision: D22925548
fbshipit-source-id: d1f411683
Summary:
Do not always add parens around sub-terms, and add more parens around
terms in atoms and normal forms when they can be confused with the atom
or normal form structure.
Reviewed By: skcho
Differential Revision: D22925549
fbshipit-source-id: 8646e96a5
Summary: These will change to more interesting outputs in the next diff.
Reviewed By: dulmarod
Differential Revision: D22921349
fbshipit-source-id: c58c6240a
Summary:
Add unit tests to pulse in order to write tests for the arithmetic
solver, because it is a pain to write programs to do that end to end.
Reviewed By: ezgicicek
Differential Revision: D22864607
fbshipit-source-id: 0a20a3593
Summary:
This is needed to make dune auto-updating of unit tests introduced in
the next diff cohabit peacefully with our tests to make sure code stays
correctly formatted wrt ocamlformat.
Also, more auto-formatting = better.
Reviewed By: da319
Differential Revision: D22865004
fbshipit-source-id: 91c47ab08
Summary:
Normalization is potentially expensive and its result should be
remembered if the formula keeps being used. In the future we might use
this to make normalization more incremental.
Also rename PathCondition.satisfiable -> is_unsat to match
PulseFormula.is_unsat.
Reviewed By: skcho
Differential Revision: D22728264
fbshipit-source-id: 7759b33ac
Summary:
Now that this is a cheap operation, use it whenever we are checking the
satisfiability of the path condition.
Reviewed By: skcho
Differential Revision: D22724373
fbshipit-source-id: df31c6010
Summary:
Pausing the experiment in favour of new PulseFormula. Can be resurrected
later.
Reviewed By: skcho
Differential Revision: D22576274
fbshipit-source-id: 76529d767
Summary:
This time it's personal.
Roll out pulse's own arithmetic domain to be fast and be able to add
precision as needed. Formulas are precise representations of the path
condition to allow for good inter-procedural precision. Reasoning on
these is somewhat ad-hoc (except for equalities, but even these aren't
quite properly saturated in general), so expect lots of holes.
Skipping dead code in the interest of readability as this (at least
temporarily) doesn't use pudge anymore. This may make a come-back as
pudge has/will have better precision: the proposed implementation of
`PulseFormula` is very cheap so can be used any time we could want to
prune paths (see following commits), but this comes at the price of some
precision. Calling into pudge at reporting time still sounds like a good
idea to reduce false positives due to infeasible paths.
#skipdeadcode
Reviewed By: skcho
Differential Revision: D22576004
fbshipit-source-id: c91793256
Summary:
It's typically used inside another ~fold argument and it gets too
verbose.
Reviewed By: da319
Differential Revision: D22846501
fbshipit-source-id: 2fdd4271f
Summary: There used to be `JoinAfter n` mode where we would try to join `n` states instead of always making disjunctions. It got deleted in D14258485 and Pulse's underlying (pre-disjuncts) domain doesn't even have a join operation. `NeverJoin` mode is not useful in Pulse anymore: pulse will diverge or OOM if we don't limit the number of disjuncts. It is also not used by any other analyzer. Let's remove it.
Reviewed By: jvillard
Differential Revision: D22817425
fbshipit-source-id: 1e658f11d
Summary: This diff refactors Java specific `PatternMatch` functions into its own module. When `PatternMatch.ml` was originally created, it was mainly for Java but now it also supports ObjC. Let's refactor it to reflect the Java/ObjC separation: move all functions that operate on Java procnames into Java submodule.
Reviewed By: jvillard
Differential Revision: D22816504
fbshipit-source-id: ff6b64b29
Summary: We model internal builtin `__new` function to return a non-null value. This fixes nullptr_dereference false positives where we explicitly check the result of a function call for nullptr when the function returns a newly created object.
Reviewed By: jvillard
Differential Revision: D22772217
fbshipit-source-id: 37d209697
Summary:
This step does extra normalization so it's useful to see what's going on
when debugging. Log stuff in the html debug of the exit node.
Reviewed By: da319
Differential Revision: D22596248
fbshipit-source-id: cde3bbb6c
Summary:
Pulse has models for iterators that make them use a fake field to
remember the element of the collection they point to. But, not all
methods are modelled, and some of them look at the real field, eg
`operator==`. Since we don't update the real field in the model, this
causes imprecision.
The imprecision was visible in pudge.
Reviewed By: skcho
Differential Revision: D22576003
fbshipit-source-id: 2af6be646
Summary:
When applying function summaries, we are careful not to violate the
summary's assumptions about non-aliasing. For example, the summary we
generate for `foo(x,y) { *x = *y; }` will have `x` and `y` be allocated
to two different `AbstractValue.t` in the heap, representing
disjointness.
However, the current logic is too coarse and also rejects passing the
same pure value to functions that made no assumption about them being
equal or different, eg `goo(int x,int y) { int z = x + y; }`. This is
because the corresponding `AbstractValue.t` are different in the
callee's summary, but are represented by only one same value in callers
such as `goo(i,i)`.
This diff restricts the "don't violate aliasing" condition to only
consider heap-allocated values. This is consistent with separation logic
by the way: we use the implication `x|->- * y|->- |- x≠y`, which is
valid only when both `x` and `y` are both allocated in the heap as in
the left-hand-side of `|-`.
Reviewed By: skcho
Differential Revision: D22574297
fbshipit-source-id: 206a18499
Summary:
We update the type of captured variables to include information about capture mode (`ByReference` or `ByValue`) both for procdesc attributes and the closure expression.
For lambda: closure expression now contains correct capture mode for capture variables. Procdesc still does not contain information about captured variables which we will address in the next diff.
For objc blocks: at the moment all captured variables have mode `ByReference`. Added TODOs to fix this.
Reviewed By: jvillard
Differential Revision: D22572054
fbshipit-source-id: 4c88678ee
Summary:
In order to allow implementations of the single Fol interface using
multiple backend first-order logic solvers, add explicit definitions
of terms and formulas in the Fol module, and implement Context in
terms of them.
The Fol interface supports freely mixing Terms and Formulas, in
particular there is `Term.ite : cnd:Formula.t -> thn:Term.t ->
els:Term.t -> Term.t` which allows Formulas to appear in Terms. The
Fol implementation performs enough normalization to enable using an
internal representation of terms that is strictly partitioned into
"theory terms" and "formulas", which are stratified below "conditional
terms" and then below "general terms". This partitioning and
stratification enables using backend solvers that do not support
mixing formulas in terms.
Reviewed By: jvillard
Differential Revision: D22170506
fbshipit-source-id: a014ee7d7
Summary: To avoid NULLPTR_DEREFERENCE false positives we want to model some functions as returning non-null. A new flag --pulse-model-return-nonnull allows us to provide a list of such functions.
Reviewed By: ezgicicek
Differential Revision: D22431564
fbshipit-source-id: 9944c7382
Summary:
This one is observed to be more memory efficient. Intuitively, maps need
to be re-allocated more often than lists for balancing. In pulse, we'll
often only ever add new values, in increasing order (when they are fresh
variables created as we symbolically execute the program), which pushes
maps into their worst-case allocation pattern. At least I suspect that's
what happens. With lists, this case is handled much better as lists are
not re-allocated when adding elements.
This is somewhat confirmed by benchmarking and observing GC stats.
Reviewed By: skcho
Differential Revision: D22140908
fbshipit-source-id: 29815112f
Summary:
Keyword `thread_local` in cpp allows us to create a variable with thread storage duration, meaning that the object's lifetime begins when the thread begins and ends when the thread ends.
We get `NULLPTR_DEREFERENCE` false positive for `thread_local` variable since we reallocate it in the `VariableLifetimeBegins` metadata instruction and we do not see further updates to the variable. To solve the issue we special case `VariableLifetimeBegins` instruction for global variables.
Reviewed By: jvillard
Differential Revision: D22284135
fbshipit-source-id: 13c14ef90
Summary:
This model is very important in the analysis of ObjC classes because the pattern
```
- (instancetype)init {
if (self = [super init]) {
...
}
return self;
}
```
is very common, so we need to know that if the super class is `NSObject`, the implementation of `init` is returning `self`, otherwise it's a skip function and we don't get the correct spec for the function. We fix some memory leak FP with this model, see test.
Reviewed By: ezgicicek
Differential Revision: D22259281
fbshipit-source-id: 3ee48c827
Summary:
We need to check if `folly::Optional` is not `folly::none` if we want to retrieve the value, otherwise a runtime exception is thrown:
```
folly::Optional<int> foo{folly::none};
return foo.value(); // bad
```
```
folly::Optional<int> foo{folly::none};
if (foo) {
return foo.value(); // ok
}
```
This diff adds a new issue type that reports if we try to access `folly::Optional` value when it is known to be `folly::none`.
Reviewed By: ezgicicek
Differential Revision: D22053352
fbshipit-source-id: 32cb00a99
Summary: This continues on the previous diff by removing the model for `__bridge_transfer` in biabduction. This also had the name __free_cf which we kept for compatibility with biabduction until now but that we can now change.
Reviewed By: ezgicicek
Differential Revision: D22207396
fbshipit-source-id: 7a175eca6
Summary: To avoid NULLPTR_DEREFERENCE false positives we want to treat some functions as `abort`. A new flag `--pulse-model-abort` allows us to provide a list of such functions.
Reviewed By: ezgicicek
Differential Revision: D21962555
fbshipit-source-id: d46b93c99
Summary:
The past issue with ppx_compare on nonrec types has (at some point) been fixed.
Greped for `let compare = compare` and removed the workaround for `nonrec`.
Reviewed By: jberdine
Differential Revision: D21973087
fbshipit-source-id: 5e2043e20
Summary:
It has no dependencies on the rest of the sledge codebase and might be
more generally useful.
Reviewed By: jvillard
Differential Revision: D21720980
fbshipit-source-id: b4f061e73
Summary:
See previous diff: issues are always reported with the same severity so
recognise that and just use their default severity in "modern" checkers.
Reviewed By: ngorogiannis
Differential Revision: D21904591
fbshipit-source-id: fb5387e35
Summary:
This diff implements part of the memory management for Objective-C classes in ARC, namely that `dealloc` is called when the objects become unreachable. In reality the semantics of ARC says that this happens when their reference count becomes 0, but we are not modelling this yet in Pulse. However, we could in the future.
This fixes false positives memory leaks when the memory is freed in dealloc.
`dealloc` is often implicit in Objective-C, it also calls the dealloc of instance variables and superclass. None of this is implemented yet, and will be done in a future diff. This will be added in the frontend probably, similarly to how it's done for C++ destructors.
This is an important part of modelling Objective-C semantics in Infer, I looked at whether this should be a preanalysis to be used by all analyses but this needs Pulse. So the idea is that any analysis that needs to understand Objective-C memory model well, should have Pulse as a preanalysis.
Reviewed By: jvillard
Differential Revision: D21762292
fbshipit-source-id: ced014324
Summary:
Adding a new attribute for dynamic type. It is set in the models of constructors, currently only in `alloc` in Objective-C. We use it in the following diff to figure out which `dealloc` method to call. However it could be useful for other things, such as dynamic dispatch.
#skipdeadcode
Reviewed By: jvillard
Differential Revision: D21739928
fbshipit-source-id: 9276c0a4d
Summary: The models were too naive before since they invalidated the underlying array completely (copying C++'s push_back model), causing spurious vector invalidation issues in Java. This diff adds more reasonable models.
Reviewed By: skcho
Differential Revision: D21787543
fbshipit-source-id: a5a59ff69
Summary: Assigning `nullptr` to `std::function` was causing `NULLPTR_DEREFERENCE` as our model was expecting to get an object in the right hand side of the assignment (`std::function::operator=`) and was dereferencing that object. Assigning `nullptr` to `std::function` removes callable object from it. We model this special case by creating a fresh value.
Reviewed By: skcho
Differential Revision: D21685318
fbshipit-source-id: 2d4af1933
Summary:
- avoid creating issues just to look up their `unique_id` in the set
- avoid `let _ =` since it can hide partial applications
- delete outdated comment
Reviewed By: skcho
Differential Revision: D21663959
fbshipit-source-id: e50d02447
Summary:
`partition` always constructs two new maps, which is expensive when
there are a lot of entries. Let's avoid it if possible.
Reviewed By: jvillard
Differential Revision: D21684298
fbshipit-source-id: a8674d358
Summary:
Add an extra argument everywhere we report about the identity of the
checker doing the reporting. This isn't type safe in any way, i.e. a
checker can masquerade as another. But, hopefully it's enough to ensure
checker writers (and diff reviewers) have a chance to reflect on what
issue type they are reporting.
Reviewed By: ngorogiannis
Differential Revision: D21638823
fbshipit-source-id: b4a4b0c0a
Summary:
Just like `CFBridgingRelease` we want to be able to model functions that are specific to a given codebase that make a transfer of memory ownership so that developers don't need to worry about releasing that memory anymore, and hence, we don't want to report leaks on that memory.
Things get a little more complicated, because some of the functions we want to model are in a specific namespace, so with this flag we take both cases into account, when we are dealing with namespaces or not.
Reviewed By: jvillard
Differential Revision: D21404409
fbshipit-source-id: c36bd7afc
Summary: Currently we get false positive if we apply `operator--` to the `end()` iterator. To solve this, we model iterator `operator--` not to raise an error for the `EndIterator` invalidation, but to create a fresh element in the underlying array.
Reviewed By: ezgicicek
Differential Revision: D21476353
fbshipit-source-id: 5c722372e
Summary:
It is undefined behavior to dereference end iterator.
To catch end iterator dereferencing issues we change iterator model: instead of having `internal pointer` storing the current index, we model it as a pointer to a current index. This allows us to model `end()` iterator as having an invalid pointer and there is no need to create an invalidated element in the vector itself.
Reviewed By: ezgicicek
Differential Revision: D21178441
fbshipit-source-id: fd6a94b0b
Summary: We mistakenly invalidated the set element which causes spurious vector invalidation errors. Instead, we should modify it without any invalidation.
Reviewed By: jvillard
Differential Revision: D21521943
fbshipit-source-id: 67963967e
Summary: Java's iterator models were wrong. This causes `VECTOR_INVALIDATION` errors in fbandroid projects. This diff aims to fix it by modeling Java iterators with a current pointer and an underlying collection array.
Reviewed By: skcho
Differential Revision: D21448322
fbshipit-source-id: 7d44354b5
Summary:
Because in the real semantics CFRelease can be used more than once, and also the variables can be used after CFRelease in general, modelling this as `free` causes many `USE_AFTER_FREE` errors. Now we change the model to not add the `Invalid CFree` attribute, but to just remove the `Allocated` attribute. So we can model memory leaks in the simple case of `Create` and not `CFRelease` before going out of scope, but we avoid the `USE_AFTER_FREE`.
Since the model for CFRelease now diverges from free, changed the command line option for modelling to `pulse-model-release-pattern`.
Reviewed By: jvillard
Differential Revision: D21324895
fbshipit-source-id: ab323d981
Summary: This diff gets only one disjunct from blacklisted callee, in order to avoid OOMing in specific cases.
Reviewed By: jvillard
Differential Revision: D21406023
fbshipit-source-id: f9214c9c6
Summary:
An easy one. One subtlety: I needed to name the library "pulselib"
instead of "pulse" because dune got confused by the Pulse.ml module.
Reviewed By: skcho
Differential Revision: D21401815
fbshipit-source-id: 05e75b1fa
Summary:
Main change: needed to cut the dependency of inferbo on pulse, since
pulse will need to depend on inferbo. Achieved by changing the ad-hoc
"PulseValue" into a little less ad-hoc "ForeignVariable" variant.
Reviewed By: skcho
Differential Revision: D21401816
fbshipit-source-id: bb341b9ff
Summary:
List of things happening in this unreviewable diff:
- moved PulsePathCondition to PulseSledge
- renamed --pulse-path-conditions to --pudge
- PulsePathCondition now contains all the arithmetic of pulse
(inferbo+concrete intervals+pudge). In particular, moved arithmetic
attributes into PulsePathCondition.t. PulsePathCondition plays the
role of PulseArithmetic (combining all domains).
- added tests for a false positive involving free()
- PulseArithmetic is now just a thin wrapper around PulsePathCondition
to operate on states directly (instead of on path conditions).
- The rest is mostly moving code into PulsePathCondition (eg, from
PulseInterproc) and adjusting it.
Reviewed By: jberdine
Differential Revision: D21332073
fbshipit-source-id: 184c8e0a9
Summary:
Add a new data structure and use it for the map of memory accesses to
limit the number of destinations reachable from a given address. This
avoids remembering details of each index in large arrays, or even each
field in large structs.
Reviewed By: skcho
Differential Revision: D18246091
fbshipit-source-id: 5d3974d9c
Summary:
The idea was to keep track of why we know certain facts but actually
these traces are never read. Other arithmetic facts (BoItv and the path
condition) don't have histories so remove them from concrete intervals
too.
Reviewed By: dulmarod
Differential Revision: D21303353
fbshipit-source-id: eecf07b05
Summary:
We were invalidating "*(vec.__infer_backing_array)" instead of the
address of the field itself.
Reviewed By: ezgicicek
Differential Revision: D21280357
fbshipit-source-id: 48b984800
Summary:
`ProcData.t` contains a `Summary.t`. Eventually we want to fix this too
so that checkers don't depend on backend/, i.e. on all the other
checkers via Summary.ml. But in order to migrate progressively we can
first migrate absint/ and one step on the way is for it to not know what
kind of analysis data it is passing around.
This extra flexibility only costs us passing an extra `Procdesc.t` in a
couple more functions so it's actually not a bad change in itself.
Reviewed By: ngorogiannis
Differential Revision: D21257466
fbshipit-source-id: a91f7b191
Summary:
This is a step in disentangling the various analyses: that file used to
make every checker on biabduction because of a few of its functions that
use biabduction datatypes.
Split reporting.ml into:
- Reporting.ml: the functions all checkers need to report errors. This
is put in absint/ with the other files that are needed by all
checkers.
- SummaryReporting.ml: functions that need to depend on Summary.ml
(useful for later). This is put in backend/ where Summary.ml lives.
- BiabductionReporting.ml: for the biabduction analysis
The rest of the changes are renames to use the appropriate module
amongst the above.
Reviewed By: ngorogiannis
Differential Revision: D21257468
fbshipit-source-id: fa28cefbc
Summary:
We model `malloc` in Objective-C as `malloc_not_fail` I think because the null case is not normally handled in iOS apps because the OS will just killed the app after giving some memory warnings.
So adding `malloc_not_fail` model to Pulse.
Reviewed By: jvillard
Differential Revision: D21278527
fbshipit-source-id: 17a5008fe
Summary:
This translates the construct `ObjCBridgedCastExpr` when the cast_kind is `OBC_BridgeTransfer`, or in syntax, the cast (`__bridge_transfer`).
This cast means that the object is passed from manual memory management to ARC, so one doesn't need to call `release` manually. It is important to model this to avoid false positives.
It translates it as a builtin that we then model in Pulse, the same way we modelled `CFBridgingRelease` which does the same thing.
The name of the builtin is `__free_cf` which is not ideal but I left it like that for compatibility with biabduction. We can change it once we remove this check from biabduction.
update-submodule: facebook-clang-plugins
Reviewed By: jvillard
Differential Revision: D21176337
fbshipit-source-id: 736ceeb9b
Summary: Iterator invalidation traces were based on vector rather than iterator itself.
Reviewed By: ezgicicek
Differential Revision: D21202047
fbshipit-source-id: 62ce8a488
Summary:
We ignored allocator models for vectors, and were not able to initialize vectors properly. This diff fixes this issue.
It also adds a test which was a FN before.
Reviewed By: skcho, jvillard
Differential Revision: D21089492
fbshipit-source-id: 6906cd1d1
Summary: `CFBridgingRelease` and `__bridge_transfer` which I'll model later, transfer the memory model from manual memory ref count to ARC (automatic ref count), so to avoid false positives this needs to be modelled. We can simply remove the Allocated attribute from the state, which means we won't try to track that memory anymore.
Reviewed By: skcho
Differential Revision: D21088218
fbshipit-source-id: 3520a0d59
Summary:
Replace horrible hack with ok hack.
The main difficulty in implementing the disjunctive domain is to avoid
the quadratic time complexity of executing the same disjuncts over and
over again when going around loops:
First time around a loop, assuming for example a single disjunct `d`:
```
[d]
loop body
[d1' \/ d2']
```
Second time around the same loop: the new pre will be the join of the
posts of predecessor nodes, so `old_pre \/ post(loop,old_pre)`, i.e.
`d \/ d1' \/ d2'`. Now we need to execute `loop body` again
*without running the symbolic execution of `d` again* (and the time after
that we'll want to not execute `d`, `d1'`, or `d2'`).
Horrible hack (before): Disjuncts have a boolean "visited" attached
that does its best to keep track of whether a given disjunct is old or
new. When executing a single *instruction* look at the flag and skip the
state if it's old. Of course we have no way to know for sure so it turns
out it was often wrongly re-executing old disjuncts. This was also
producing the wrong results over even simple loops: only the last
iteration would make it outside the loop for some reason. Overall, the
semantics were pretty untractable and shady at best.
New hack (this diff): only run instructions of a given *node* on
disjuncts that are not physically equal to the "pre" ones already in the
invariant map for the current node.
This gives the correct result over simple loops and a nice performance
improvement in general (probably the old heuristic was hitting the
quadratic bad case more often).
Reviewed By: skcho
Differential Revision: D21154063
fbshipit-source-id: 5ee38c68c
Summary:
This is a preparatory diff to make the actual change more readable. This
just moves the code around, trying to change it as little as possible.
Reviewed By: skcho
Differential Revision: D21154065
fbshipit-source-id: e086318c1
Summary:
Computing sledge's equality relation and normalising terms is costly. We
can avoid doing that most of the time by keeping the sledge path
condition lazily evaluated and only forcing it down to a value at two
critical points in the analysis:
1. Summary creation, to avoid storing unsatisfiable pre/posts that will have
to be needlessly executed by callers. This also saves us from having
to serialise the closures involved in the uncomputed form of lazy
values inside the pulse summaries.
2. Before reporting errors we check in the state is in fact satisfiable.
If not we just prune it away at that point.
This yields ~4x speedup on some targets.
Reviewed By: ezgicicek
Differential Revision: D21129759
fbshipit-source-id: a75fdd3bc
Summary:
This is mostly just a type change for now, more changes to come. This
doesn't make thing much faster yet because we force computations pretty
often to check for unsatisfiability (each function call and PRUNE node).
Next diff will build on that.
Reviewed By: skcho
Differential Revision: D21129758
fbshipit-source-id: 72200e2b1
Summary:
When encountering a constant, pulse creates an abstract value (a
variable) to represent it, and remembers that it's equal to it. The
problem is that pulse doesn't yet know how to deal with the fact that
some variables are going to be equal to each other.
This hacks around this issue in the case of constants, within the same
procedure, by remembering which constants have been assigned to which
place-holder variables, and serving those variables again when the same
constant is translated again.
Limitation: this doesn't work across procedure calls as the "constant
maps" are not saved in summaries.
Something to look out for: we don't want to make `if (p == NULL)` create
a path where `p` is invalid (we only make null invalid when we see an
assignment from 0, i.e. `p = NULL;`).
Reviewed By: ezgicicek
Differential Revision: D21089961
fbshipit-source-id: 5ebb85d0a
Summary: Modeling vector iterator with two internal fields: an internal array and an internal pointer. The internal array field points to the internal array field of a vector; the internal pointer field represents the current element of the array. For now `operator++` creates a fresh element inside the array.
Reviewed By: ezgicicek
Differential Revision: D21043304
fbshipit-source-id: db3be49ce
Summary:
Add a path condition to each symbolic state, represented in sledge's arithmetic domain. This gives a precise account of arithmetic constraints. In particular, it is relation and thus is more robust in the face of inter-procedural analysis.
This is gated behind a flag for now as there are performance issues with the new arithmetic.
Reviewed By: jberdine
Differential Revision: D20393947
fbshipit-source-id: b780de22a
Summary:
Instead of having to remember to update both the inferbo and the concrete
intervals domains of pulse, hide these details under a unified API. This
should help the transition to adding a third(!) numerical domain later
on (pudge!).
Reviewed By: ezgicicek
Differential Revision: D21022920
fbshipit-source-id: 783157464
Summary:
Now that the shape of the record type of AbductiveDomain.t is known, we
don't need this getter anymore. Keep `get_pre` and `get_post` as they
perform useful casting to `BaseDomain.t`.
Reviewed By: ezgicicek
Differential Revision: D21022924
fbshipit-source-id: 340f4edf8
Summary:
The "interface" modules define short forms for the internals of pulse
and also serve as a guide of which modules you are supposed to use at
which "level" in the pulse domains (base domain vs abductive domain vs
higher-level PulseOperations.ml). Make sure they are used.
Reviewed By: skcho
Differential Revision: D21022927
fbshipit-source-id: f890df245
Summary:
PulseAbductiveDomain.ml can be split into two distinct parts:
1. The definition of the "abductive domain" itself. This remains in that
file.
2. How to apply a given pre/post pair to the current state (during a
function call). This is about the same size as 1. in terms of lines
of code(!) and is now in PulseInterproc.ml.
Reviewed By: ezgicicek
Differential Revision: D21022921
fbshipit-source-id: 431fe061e
Summary:
I'm moving this code in the next diff and need this refactor. It should
be the same as before.
Reviewed By: ezgicicek
Differential Revision: D21022926
fbshipit-source-id: ebe644ef9
Summary: Unify the models of malloc and for the Create and Copy functions for Core Graphics. This add the null case from the malloc model to the Core Graphics models.
Reviewed By: jvillard
Differential Revision: D20890956
fbshipit-source-id: 278ac9d2f
Summary:
As soon as pulse detects an error, it completely stops the analysis and loses the state where the error occurred. This makes it difficult to debug and understand the state the program failed. Moreover, other analyses that might build on pulse (e.g. impurity), cannot access the error state.
This diff aims to restore and display the state at the time of the error in `PulseExecutionState` along with the diagnostic by extending it as follows:
```
type exec_state =
| represents the state at the program point that caused an error *)
```
As a result, since we don't immediately stop the analysis as soon as we find an error, we detect both errors in conditional branches simultaneously (see test result changes for examples).
NOTE: We need to extend `PulseOperations.access_result` to keep track of the failed state as follows:
```
type 'a access_result = ('a, Diagnostic.t * t [denoting the exit state] ) result
```
Reviewed By: jvillard
Differential Revision: D20918920
fbshipit-source-id: 432ac68d6
Summary:
This diff lifts the `PulseAbductiveDomain.t` in `PulseExecutionState` by tracking whether the program continues the analysis normally or exits unusually (e.g. by calling `exit` or `throw`):
```
type exec_state =
| ContinueProgram of PulseAbductiveDomain.t (** represents the state at the program point *)
| ExitProgram of PulseAbductiveDomain.t
(** represents the state originating at exit/divergence. *)
```
Now, Pulse's actual domain is tracked by `PulseExecutionState` and as soon as we try to analyze an instruction at `ExitProgram`, we simply return its state.
The aim is to recover the state at the time of the exit, rather than simply ignoring them (i.e. returning empty disjuncts). This allows us to get rid of some FNs that we were not able to detect before. Moreover, it also allows the impurity analysis to be more precise since we will know how the state changed up to exit.
TODO:
- Impurity analysis needs to be improved to consider functions that simply exit as impure.
- The next goal is to handle error state similarly so that when pulse finds an error, we recover the state at the error location (and potentially continue to analyze?).
Disclaimer: currently, we handle throw statements like exit (as was the case before). However, this is not correct. Ideally, control flow from throw nodes follows catch nodes rather than exiting the program entirely.
Reviewed By: jvillard
Differential Revision: D20791747
fbshipit-source-id: df9e5445a
Summary:
Malloc returns either an allocated object or a null pointer if there is no memory available. Modelling that.
This has always been a bit contentious because this leads to NPEs that people often ignores because they don't care. But if we don't model this, then we have FPs when people do take this into account when freeing the memory.
Reviewed By: jvillard
Differential Revision: D20791692
fbshipit-source-id: 6fd259f12
Summary: Modelling `CG.*Release ` and `CFRelease` as `free`. This is what we were doing in biabduction.
Reviewed By: skcho
Differential Revision: D20767174
fbshipit-source-id: c77c1cdc6
Summary:
This models all the Create and Copy functions from CoreGraphics, examples in the tests.
These functions all allocate memory that needs to be manually released.
The modelling of the release functions will happen in a following diff. Until then, we have some false positives in the tests.
This check is currently in biabduction, and we aim to move it to Pulse.
Reviewed By: jvillard
Differential Revision: D20626395
fbshipit-source-id: b39eae2d9
Summary: When looking at some reports I realised that adding the place where the memory becomes unreachable to the trace makes it more readable.
Reviewed By: skcho
Differential Revision: D20790277
fbshipit-source-id: d5df69e68
Summary:
- Model `System.exit()` as early_exit and add a test
- Tweak message of methods that are impure due to having no pulse summary (and add a test)
Reviewed By: skcho
Differential Revision: D20668979
fbshipit-source-id: 6b5589aae
Summary:
- the order of call state was wrong when printing contradiction for CItv
- add a test for impurity
Reviewed By: jvillard
Differential Revision: D20646181
fbshipit-source-id: 1c86fd0a4
Summary:
First version of a new memory leak check based on Pulse. The idea is to examine unreachable cells in the heap and check that the "Allocated" attribute is available but the "Invalid CFree" isn't. This is done when we remove variables from the state.
Currently it only works for malloc, we can extend it to other allocation functions later.
Reviewed By: jvillard
Differential Revision: D20444097
fbshipit-source-id: 33b6b25a2
Summary:
- Add more naive pulse models for:
- `System.arraycopy`
- `StringBuilder.setLength`
- `StringBuilder.delete`
- Model the following as pure
- `SparseArrayCompat.valueAt`
- `File.get...`
- Add a nice test
Reviewed By: jvillard
Differential Revision: D20513397
fbshipit-source-id: 6d412d13a
Summary:
`make deadcode` is failing on master but our CI jobs didn't catch it :(
Let's fix existing deadcode for now.
Reviewed By: martintrojer
Differential Revision: D20510062
fbshipit-source-id: 4a5e5f849
Summary:
Previously, at each function call, we added a `WrittenTo` attribute for applying the address of the actuals. However, this results in mistakenly considering each function application that inspects its argument as impure. Instead, we should only propagate `WrittenTo` if the actuals have already `WrittenTo` attributes.
For instance, for the following functions
```
public static boolean is_null(Byte a) {
return a == null;
}
public static boolean call_is_null(Byte a) {
return is_null(a);
}
```
We used to get the following pulse summary for `call_is_null` (showing only one of the disjuncts):
```
#0: PRE:
{ roots={ &a=v1 };
mem ={ v1 -> { * -> v2 } };
attrs={ v1 -> { MustBeValid },
v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };}
POST:
{ roots={ &a=v1, &return=v8 };
mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } };
attrs={ v2 -> { Arith =null,
BoItv ([max(0, v2), min(0, v2)]),
WrittenTo-----------WRONG },
v4 -> { Arith =1,
BoItv (1),
Invalid ConstantDereference(is the constant 1),
WrittenTo-----------WRONG },
v8 -> { WrittenTo } };}
SKIPPED_CALLS: { }
```
where we mistakenly recorded a `WrittenTo` for `v2` (what `a` points to). As a result, we considered `call_is_null` as impure :( This diff fixes that since the callee `is_null` doesn't have any `WrittenTo` attributes for its parameter `a`. So, we don't propagate `WrittenTo` and get the following summary
```
#0: PRE:
{ roots={ &a=v1 };
mem ={ v1 -> { * -> v2 } };
attrs={ v1 -> { MustBeValid },
v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };}
POST:
{ roots={ &a=v1, &return=v8 };
mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } };
attrs={ v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) },
v4 -> { Arith =1,
BoItv (1),
Invalid ConstantDereference(is the constant 1) },
v8 -> { WrittenTo } };}
SKIPPED_CALLS: { }
```
Reviewed By: skcho
Differential Revision: D20490102
fbshipit-source-id: 253d8ef64
Summary:
This diff naively models the following as `StdVector.push_back`:
- `StringBuilder.append`
- `String.replace`
- `Queue.poll`
It also adds a FN test for `Iterator.next`.
Reviewed By: skcho
Differential Revision: D20469786
fbshipit-source-id: 2d8e8d117
Summary: Let's also print skipped calls in `pp` to ease debugging both for summary and intermediate steps.
Reviewed By: jvillard
Differential Revision: D20417852
fbshipit-source-id: 7da03ae81
Summary:
There is a module and a module type in the file PulseAbductiveDomain.ml
with the same name. This is confusing and it's better to keep separate names.
Reviewed By: jvillard
Differential Revision: D20388769
fbshipit-source-id: bcfed436e
Summary:
Be a bit more careful about the difference between PrePost.t and
AbductiveDomain.t. It's needed in another diff where the types will be
different.
Reviewed By: ezgicicek
Differential Revision: D20393927
fbshipit-source-id: beaf80c90
Summary: In preparation for PulseArithmetic to be something else.
Reviewed By: ezgicicek
Differential Revision: D20393928
fbshipit-source-id: d93131e12
Summary:
D20362149 missed
- to pass the optional argument `include_value_history` to the recursive call in `PulseTrace.add_to_errlog`.
- to set `include_value_history=false` for skipped calls.
This diff fixes these issues.
Reviewed By: skcho
Differential Revision: D20385604
fbshipit-source-id: 176e4d010
Summary:
Adding a model for malloc: we add an attribute "Allocated". This can be used for implementing memory leaks: whenever the variables get out of scope, we can check that if the variable has an attribute Allocated, it also has an attribute Invalid CFree.
Possibly we will need more details in the Allocated attribute, to know if it's malloc, or other allocation function, but we can add that later when we know how it should look like.
Reviewed By: jvillard
Differential Revision: D20364541
fbshipit-source-id: 5e667a8c3
Summary: Impurity traces are quite big due to recording values histories. Let's simplify the traces by removing pulse's value histories.
Reviewed By: skcho
Differential Revision: D20362149
fbshipit-source-id: 8a2a6115e
Summary: This diff adds a model for Java's `Object.clone()` method (similar to existing shallow_copy).
Reviewed By: jvillard
Differential Revision: D20341073
fbshipit-source-id: 30ae40fe7