Summary:
This diff implements part of the memory management for Objective-C classes in ARC, namely that `dealloc` is called when the objects become unreachable. In reality the semantics of ARC says that this happens when their reference count becomes 0, but we are not modelling this yet in Pulse. However, we could in the future.
This fixes false positives memory leaks when the memory is freed in dealloc.
`dealloc` is often implicit in Objective-C, it also calls the dealloc of instance variables and superclass. None of this is implemented yet, and will be done in a future diff. This will be added in the frontend probably, similarly to how it's done for C++ destructors.
This is an important part of modelling Objective-C semantics in Infer, I looked at whether this should be a preanalysis to be used by all analyses but this needs Pulse. So the idea is that any analysis that needs to understand Objective-C memory model well, should have Pulse as a preanalysis.
Reviewed By: jvillard
Differential Revision: D21762292
fbshipit-source-id: ced014324
Summary:
Adding a new attribute for dynamic type. It is set in the models of constructors, currently only in `alloc` in Objective-C. We use it in the following diff to figure out which `dealloc` method to call. However it could be useful for other things, such as dynamic dispatch.
#skipdeadcode
Reviewed By: jvillard
Differential Revision: D21739928
fbshipit-source-id: 9276c0a4d
Summary:
`partition` always constructs two new maps, which is expensive when
there are a lot of entries. Let's avoid it if possible.
Reviewed By: jvillard
Differential Revision: D21684298
fbshipit-source-id: a8674d358
Summary: Currently we get false positive if we apply `operator--` to the `end()` iterator. To solve this, we model iterator `operator--` not to raise an error for the `EndIterator` invalidation, but to create a fresh element in the underlying array.
Reviewed By: ezgicicek
Differential Revision: D21476353
fbshipit-source-id: 5c722372e
Summary:
List of things happening in this unreviewable diff:
- moved PulsePathCondition to PulseSledge
- renamed --pulse-path-conditions to --pudge
- PulsePathCondition now contains all the arithmetic of pulse
(inferbo+concrete intervals+pudge). In particular, moved arithmetic
attributes into PulsePathCondition.t. PulsePathCondition plays the
role of PulseArithmetic (combining all domains).
- added tests for a false positive involving free()
- PulseArithmetic is now just a thin wrapper around PulsePathCondition
to operate on states directly (instead of on path conditions).
- The rest is mostly moving code into PulsePathCondition (eg, from
PulseInterproc) and adjusting it.
Reviewed By: jberdine
Differential Revision: D21332073
fbshipit-source-id: 184c8e0a9
Summary: `CFBridgingRelease` and `__bridge_transfer` which I'll model later, transfer the memory model from manual memory ref count to ARC (automatic ref count), so to avoid false positives this needs to be modelled. We can simply remove the Allocated attribute from the state, which means we won't try to track that memory anymore.
Reviewed By: skcho
Differential Revision: D21088218
fbshipit-source-id: 3520a0d59
Summary:
Computing sledge's equality relation and normalising terms is costly. We
can avoid doing that most of the time by keeping the sledge path
condition lazily evaluated and only forcing it down to a value at two
critical points in the analysis:
1. Summary creation, to avoid storing unsatisfiable pre/posts that will have
to be needlessly executed by callers. This also saves us from having
to serialise the closures involved in the uncomputed form of lazy
values inside the pulse summaries.
2. Before reporting errors we check in the state is in fact satisfiable.
If not we just prune it away at that point.
This yields ~4x speedup on some targets.
Reviewed By: ezgicicek
Differential Revision: D21129759
fbshipit-source-id: a75fdd3bc
Summary:
This is mostly just a type change for now, more changes to come. This
doesn't make thing much faster yet because we force computations pretty
often to check for unsatisfiability (each function call and PRUNE node).
Next diff will build on that.
Reviewed By: skcho
Differential Revision: D21129758
fbshipit-source-id: 72200e2b1
Summary:
Add a path condition to each symbolic state, represented in sledge's arithmetic domain. This gives a precise account of arithmetic constraints. In particular, it is relation and thus is more robust in the face of inter-procedural analysis.
This is gated behind a flag for now as there are performance issues with the new arithmetic.
Reviewed By: jberdine
Differential Revision: D20393947
fbshipit-source-id: b780de22a
Summary:
Now that the shape of the record type of AbductiveDomain.t is known, we
don't need this getter anymore. Keep `get_pre` and `get_post` as they
perform useful casting to `BaseDomain.t`.
Reviewed By: ezgicicek
Differential Revision: D21022924
fbshipit-source-id: 340f4edf8
Summary:
PulseAbductiveDomain.ml can be split into two distinct parts:
1. The definition of the "abductive domain" itself. This remains in that
file.
2. How to apply a given pre/post pair to the current state (during a
function call). This is about the same size as 1. in terms of lines
of code(!) and is now in PulseInterproc.ml.
Reviewed By: ezgicicek
Differential Revision: D21022921
fbshipit-source-id: 431fe061e
Summary:
I'm moving this code in the next diff and need this refactor. It should
be the same as before.
Reviewed By: ezgicicek
Differential Revision: D21022926
fbshipit-source-id: ebe644ef9
Summary:
As soon as pulse detects an error, it completely stops the analysis and loses the state where the error occurred. This makes it difficult to debug and understand the state the program failed. Moreover, other analyses that might build on pulse (e.g. impurity), cannot access the error state.
This diff aims to restore and display the state at the time of the error in `PulseExecutionState` along with the diagnostic by extending it as follows:
```
type exec_state =
| represents the state at the program point that caused an error *)
```
As a result, since we don't immediately stop the analysis as soon as we find an error, we detect both errors in conditional branches simultaneously (see test result changes for examples).
NOTE: We need to extend `PulseOperations.access_result` to keep track of the failed state as follows:
```
type 'a access_result = ('a, Diagnostic.t * t [denoting the exit state] ) result
```
Reviewed By: jvillard
Differential Revision: D20918920
fbshipit-source-id: 432ac68d6
Summary:
This diff lifts the `PulseAbductiveDomain.t` in `PulseExecutionState` by tracking whether the program continues the analysis normally or exits unusually (e.g. by calling `exit` or `throw`):
```
type exec_state =
| ContinueProgram of PulseAbductiveDomain.t (** represents the state at the program point *)
| ExitProgram of PulseAbductiveDomain.t
(** represents the state originating at exit/divergence. *)
```
Now, Pulse's actual domain is tracked by `PulseExecutionState` and as soon as we try to analyze an instruction at `ExitProgram`, we simply return its state.
The aim is to recover the state at the time of the exit, rather than simply ignoring them (i.e. returning empty disjuncts). This allows us to get rid of some FNs that we were not able to detect before. Moreover, it also allows the impurity analysis to be more precise since we will know how the state changed up to exit.
TODO:
- Impurity analysis needs to be improved to consider functions that simply exit as impure.
- The next goal is to handle error state similarly so that when pulse finds an error, we recover the state at the error location (and potentially continue to analyze?).
Disclaimer: currently, we handle throw statements like exit (as was the case before). However, this is not correct. Ideally, control flow from throw nodes follows catch nodes rather than exiting the program entirely.
Reviewed By: jvillard
Differential Revision: D20791747
fbshipit-source-id: df9e5445a
Summary:
This models all the Create and Copy functions from CoreGraphics, examples in the tests.
These functions all allocate memory that needs to be manually released.
The modelling of the release functions will happen in a following diff. Until then, we have some false positives in the tests.
This check is currently in biabduction, and we aim to move it to Pulse.
Reviewed By: jvillard
Differential Revision: D20626395
fbshipit-source-id: b39eae2d9
Summary:
- the order of call state was wrong when printing contradiction for CItv
- add a test for impurity
Reviewed By: jvillard
Differential Revision: D20646181
fbshipit-source-id: 1c86fd0a4
Summary:
First version of a new memory leak check based on Pulse. The idea is to examine unreachable cells in the heap and check that the "Allocated" attribute is available but the "Invalid CFree" isn't. This is done when we remove variables from the state.
Currently it only works for malloc, we can extend it to other allocation functions later.
Reviewed By: jvillard
Differential Revision: D20444097
fbshipit-source-id: 33b6b25a2
Summary:
`make deadcode` is failing on master but our CI jobs didn't catch it :(
Let's fix existing deadcode for now.
Reviewed By: martintrojer
Differential Revision: D20510062
fbshipit-source-id: 4a5e5f849
Summary:
Previously, at each function call, we added a `WrittenTo` attribute for applying the address of the actuals. However, this results in mistakenly considering each function application that inspects its argument as impure. Instead, we should only propagate `WrittenTo` if the actuals have already `WrittenTo` attributes.
For instance, for the following functions
```
public static boolean is_null(Byte a) {
return a == null;
}
public static boolean call_is_null(Byte a) {
return is_null(a);
}
```
We used to get the following pulse summary for `call_is_null` (showing only one of the disjuncts):
```
#0: PRE:
{ roots={ &a=v1 };
mem ={ v1 -> { * -> v2 } };
attrs={ v1 -> { MustBeValid },
v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };}
POST:
{ roots={ &a=v1, &return=v8 };
mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } };
attrs={ v2 -> { Arith =null,
BoItv ([max(0, v2), min(0, v2)]),
WrittenTo-----------WRONG },
v4 -> { Arith =1,
BoItv (1),
Invalid ConstantDereference(is the constant 1),
WrittenTo-----------WRONG },
v8 -> { WrittenTo } };}
SKIPPED_CALLS: { }
```
where we mistakenly recorded a `WrittenTo` for `v2` (what `a` points to). As a result, we considered `call_is_null` as impure :( This diff fixes that since the callee `is_null` doesn't have any `WrittenTo` attributes for its parameter `a`. So, we don't propagate `WrittenTo` and get the following summary
```
#0: PRE:
{ roots={ &a=v1 };
mem ={ v1 -> { * -> v2 } };
attrs={ v1 -> { MustBeValid },
v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };}
POST:
{ roots={ &a=v1, &return=v8 };
mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } };
attrs={ v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) },
v4 -> { Arith =1,
BoItv (1),
Invalid ConstantDereference(is the constant 1) },
v8 -> { WrittenTo } };}
SKIPPED_CALLS: { }
```
Reviewed By: skcho
Differential Revision: D20490102
fbshipit-source-id: 253d8ef64
Summary: Let's also print skipped calls in `pp` to ease debugging both for summary and intermediate steps.
Reviewed By: jvillard
Differential Revision: D20417852
fbshipit-source-id: 7da03ae81
Summary:
There is a module and a module type in the file PulseAbductiveDomain.ml
with the same name. This is confusing and it's better to keep separate names.
Reviewed By: jvillard
Differential Revision: D20388769
fbshipit-source-id: bcfed436e
Summary:
Be a bit more careful about the difference between PrePost.t and
AbductiveDomain.t. It's needed in another diff where the types will be
different.
Reviewed By: ezgicicek
Differential Revision: D20393927
fbshipit-source-id: beaf80c90
Summary: In preparation for PulseArithmetic to be something else.
Reviewed By: ezgicicek
Differential Revision: D20393928
fbshipit-source-id: d93131e12
Summary:
Adding a model for malloc: we add an attribute "Allocated". This can be used for implementing memory leaks: whenever the variables get out of scope, we can check that if the variable has an attribute Allocated, it also has an attribute Invalid CFree.
Possibly we will need more details in the Allocated attribute, to know if it's malloc, or other allocation function, but we can add that later when we know how it should look like.
Reviewed By: jvillard
Differential Revision: D20364541
fbshipit-source-id: 5e667a8c3
Summary: We don't need skipped calls for pre and post. Let's pull them out to `PulseAbductiveDomain`, next to pre and post.
Reviewed By: jvillard
Differential Revision: D20283589
fbshipit-source-id: 5cf970292
Summary: We forgot to take skipped calls into account for state comparison. This diff fixes that.
Reviewed By: skcho
Differential Revision: D20282739
fbshipit-source-id: 7b4d84bb0
Summary: `PulseBaseDomain.leq` is never called but was there to satisfy the signature of `NoJoin` which itself was not needed. This diff removes `include NoJoin` and instead just adds signature for `pp` in `PulseBaseDomain`.
Reviewed By: jvillard
Differential Revision: D20280104
fbshipit-source-id: 8e3659280
Summary:
Add let*/+ syntax to `result` types to simplify all the applications of
`>>=`, `>>|` that are followed by a binding (eg `>>= fun x -> ...`) in
pulse.
Reviewed By: skcho
Differential Revision: D19940728
fbshipit-source-id: 4df159029
Summary:
We can already tell that a summary cannot be applied by raising
`Contradiction`, so use this mechanism to stop applying a summary if the
number of formals doesn't match the number of actuals provided.
Previously we would return an option type and `None` in case of
mismatch, on top of the `raise Contradiction` mechanism (used for
aliasing and arithmetic contradictions).
This changes the behaviour of pulse in this case: before we would skip
over the function call, but now we stop the analysis.
Reviewed By: dulmarod
Differential Revision: D19940729
fbshipit-source-id: 6def40cd6
Summary:
Let's collect the list of all skipped functions with a `proc_name` but no summary in Pulse's memory. This will be useful for the impurity analysis later (next diff).
Concretely, we extend Pulse's domain with a map from skipped calls to their respective traces. For efficiency, we only keep a single trace per skipped call.
For impurity analysis, tracking skipped calls in Pulse allows us to rely on Pulse's strong memory model to get rid of infeasible paths as opposed to creating an independent checker which wouldn't be able to do that.
Reviewed By: jvillard
Differential Revision: D19428426
fbshipit-source-id: 3c5e482c5
Summary:
Including the current call state is useful because the contradiction
sometimes refers to abstract values that have been materialised since
the last call state so we cannot make sense of them unless we print the
current call state.
Reviewed By: skcho
Differential Revision: D18908424
fbshipit-source-id: 297f397a6
Summary:
- Do most of the work of `solve_arithmetic_constraints` inside `subst_attribute` instead, since we need to re-use the latter function for post-conditions where the first function is not appropriate.
- When substituting arithmetic constraints, we refine arithmetic information (both concrete intervals and inferbo), which can lead to inconsistent states. Instead of recording the new arithmetic facts by returning a new current state, just act as a map on attributes. This is to enable doing the point above.
- All this lead to a somewhat messy refactoring...
- Rename `CannotApplyPre` to `Contradiction` since it's used for post-conditions as well now
Reviewed By: skcho
Differential Revision: D18889120
fbshipit-source-id: d81647143
Summary:
Refine the type of inferbo intervals attributes to "pure" (non-bottom)
ones. This is because were we to get a Bottom value from inferbo we
should stop the abstract execution instead of recording it in the state.
Reviewed By: ezgicicek
Differential Revision: D18811165
fbshipit-source-id: fff8664b7
Summary:
This diff enables parsing and auto-formatting documentation
comments (aka docstrings).
I have looked at this entire diff and manually made some changes to
improve the formatting. In some cases it looked like it would take too
much time, or benefit from someone more familiar with the code doing
it, and I instead disabled auto-formatting docstrings in those files.
Also, there are some source files where the docstrings are invalid,
and some where the structure detected by the parser appears not to
match what was intended. Auto-formatting has been disabled for these
files.
Reviewed By: ezgicicek
Differential Revision: D18755888
fbshipit-source-id: 68d72465d
Summary:
The introduction of inferbo intervals as pulse attributes creates the
first relational attributes. To make sense of inferbo intervals
appearing in summaries when in a caller context, we need to substitute
the abstract values they contain in the callee with the abstract values
they correspond to in the caller.
This has a significant consequence: we have to delay the check that
arithmetic constraints in the callee are satisfiable at the call-site
until *after* we have discovered all the relationships between callee
values and caller values from the heap. To solve this, we now run an
arithmetic constraints check *after* having materialised all the
addresses.
We also need to translate the abstract values in the attributes in the
post before recording them in the caller, for the same reasons.
Quite some code in this diff is concerned with substituting pulse values
inside inferbo intervals. There is a complication there too: even after
having discovered relationships between caller and callee abstract
values induced by the heap shapes, there could be abstract values in the
callee's attributes that we haven't seen yet. We need to make up new
values for these in the caller, so this substitution has to return a
potentially extended substitution.
Reviewed By: skcho
Differential Revision: D18745695
fbshipit-source-id: 077ae7670
Summary: This extends semantics of binary operator for BoItv. If there is no known interval value for a pulse value, it returns a symbolic value of the pulse value.
Reviewed By: jvillard
Differential Revision: D18726768
fbshipit-source-id: ed8ecf78b
Summary:
This diff adds inferbo's interval values to pulse's attributes. The added values will be used to
filter out infeasible passes in the following diffs.
Reviewed By: jvillard
Differential Revision: D18726667
fbshipit-source-id: c1125ac6e
Summary:
When reporting null dereference it is useful to know where the null came
from.
Reviewed By: skcho
Differential Revision: D18206459
fbshipit-source-id: 0c8e6781b
Summary:
This simplifies the code overall. It also makes accessing the action of
a "trace" (which is now stored alongside it instead of deep inside it)
constant time instead of linear in the number of nested calls.
Reviewed By: skcho
Differential Revision: D18206460
fbshipit-source-id: 9546ff36f
Summary:
This adds a more interesting value domain to pulse: concrete intervals.
There are still two main limitations:
1. arithmetic operations are all over-approximated: any assignment involving arithmetic operations is replaced by non-determinism
2. abstract values that are discovered to be equal are not merged into one
Reviewed By: skcho
Differential Revision: D18058972
fbshipit-source-id: 0492a590f
Summary:
This does several things because it was hard to split it more:
1. Split most of the arithmetic reasoning to PulseArithmetic.ml. This
doesn't need to be reviewed thoroughly because an upcoming diff
changes the domain from just `EqualTo of Const.t` to an interval domain!
2. When going through a prune node intra-procedurally, abduce arithmetic
facts to the pre (instead of just propagating them). This is the "assume
as assert" trick used by biabduction 1.0 too and allows to propagate
arithmetic constraints to callers.
3. Use 2 when applying summaries by pruning specs whose preconditions
have un-satisfiable arithmetic constraints.
This changes one of the tests! Pulse now does a bit more work to find
the false positive, as can be seen in the longer trace.
Reviewed By: skcho
Differential Revision: D18117160
fbshipit-source-id: af3b2c8c0
Summary:
Instead of checking that each address in the pre that must be valid is
not invalid in the caller (and error out if it turns out it is invalid)
as we discover them, save these checks for after we are sure that the
precondition can be applied. It is in fact a bug that we can report an
error when trying to apply a precondition that is actually not
satisfiable in the current state for other reasons than lifetime issues.
We still want to skip calls in case of weird issues like mismatch in
number of formals vs actuals.
This will have more obvious effects later when we also check that
arithmetic facts in preconditions are satisfied at the call site: if a
pre mandates "x=1" and "y must be valid" and we have "x=0" and "y
invalid" then we shouldn't report an error.
Reviewed By: skcho
Differential Revision: D18115229
fbshipit-source-id: ad4ce72ff