Summary: Models were partial and/or simply missing (`Map` writes!). Now the modelled containers use inheritance for conciseness (`List` reads are only those not caught by the `Collection` matcher, etc). Also, add URLs to documentation sources.
Reviewed By: ezgicicek
Differential Revision: D21132069
fbshipit-source-id: fefb360f0
Summary: `CFBridgingRelease` and `__bridge_transfer` which I'll model later, transfer the memory model from manual memory ref count to ARC (automatic ref count), so to avoid false positives this needs to be modelled. We can simply remove the Allocated attribute from the state, which means we won't try to track that memory anymore.
Reviewed By: skcho
Differential Revision: D21088218
fbshipit-source-id: 3520a0d59
Summary: This diff suppresses cost issues on lambda and auto-generated procedures, since they were too noisy.
Reviewed By: ezgicicek
Differential Revision: D21153619
fbshipit-source-id: 65ad6dcc3
Summary:
Replace horrible hack with ok hack.
The main difficulty in implementing the disjunctive domain is to avoid
the quadratic time complexity of executing the same disjuncts over and
over again when going around loops:
First time around a loop, assuming for example a single disjunct `d`:
```
[d]
loop body
[d1' \/ d2']
```
Second time around the same loop: the new pre will be the join of the
posts of predecessor nodes, so `old_pre \/ post(loop,old_pre)`, i.e.
`d \/ d1' \/ d2'`. Now we need to execute `loop body` again
*without running the symbolic execution of `d` again* (and the time after
that we'll want to not execute `d`, `d1'`, or `d2'`).
Horrible hack (before): Disjuncts have a boolean "visited" attached
that does its best to keep track of whether a given disjunct is old or
new. When executing a single *instruction* look at the flag and skip the
state if it's old. Of course we have no way to know for sure so it turns
out it was often wrongly re-executing old disjuncts. This was also
producing the wrong results over even simple loops: only the last
iteration would make it outside the loop for some reason. Overall, the
semantics were pretty untractable and shady at best.
New hack (this diff): only run instructions of a given *node* on
disjuncts that are not physically equal to the "pre" ones already in the
invariant map for the current node.
This gives the correct result over simple loops and a nice performance
improvement in general (probably the old heuristic was hitting the
quadratic bad case more often).
Reviewed By: skcho
Differential Revision: D21154063
fbshipit-source-id: 5ee38c68c
Summary:
This is a preparatory diff to make the actual change more readable. This
just moves the code around, trying to change it as little as possible.
Reviewed By: skcho
Differential Revision: D21154065
fbshipit-source-id: e086318c1
Summary:
This makes the API of [instrs] easier to work with at the price of some
duplication in the GADT.
This allows us to construct `[skip]` in `AbstractInterpreter` without
imposing a particular direction. This will make it the next diffs about
a disjunctive domain easier.
Reviewed By: skcho
Differential Revision: D21153694
fbshipit-source-id: f86c180fa
Summary:
We translated the expression `CXXStdInitializerListExpr` naively in D3058895 as a call to
a skip function, with the hope that it would be translated better in the future. However, the naive means that we lose access to the initialized list/array because we are simply skipping it. So, even if we want to model the initializer properly, we have to deal with the skip specially.
This diff tries to solve this problem by removing the skip call whenever
possible. Instead, we translate the underlying array/list as a Load, so
that when it is passed to the constructor, we can pick it up.
For the following initialization:
``` std::vector<int*> vec = {nullptr};
```
Before, we translated it as
```
*&0$?%__sil_tmpSIL_materialize_temp__n$7[0]:int* const =null
n$8=_fun___infer_skip_function(&0$?%__sil_tmpSIL_materialize_temp__n$7:int* const [1*8] const )
n$9=_fun_std::vector<int*,std::allocator<int*>>::vector(&vec:std::vector<int*,std::allocator<int*>>*,n$8:std::initializer_list<int*>)
```
However, this means, `n$8` would be result of something skipped which we can't reason about. Instead, we just pass the underlying initialized array now, so we get the following translation:
```
*&0$?%__sil_tmpSIL_materialize_temp__n$7[0]:int* const =null
n$8=*&0$?%__sil_tmpSIL_materialize_temp__n$7:int* const [1*8] const
n$9=_fun_std::vector<int*,std::allocator<int*>>::vector(&vec:std::vector<int*,std::allocator<int*>>*,n$8:std::initializer_list<int*>)
```
Reviewed By: jvillard
Differential Revision: D21155014
fbshipit-source-id: 75850b1e6
Summary:
It is true that `Info` issues are normally not intended for the end user
and in general should be hidden by default.
However, the current behavior - show them only if `--no-filtering` is
true - is super non-intuitive and complicates already complex reporting
logic.
Lets use the general "enable/disable" mechanism for controlling this.
Reviewed By: jvillard
Differential Revision: D21154140
fbshipit-source-id: 69e4c88e4
Summary:
Computing sledge's equality relation and normalising terms is costly. We
can avoid doing that most of the time by keeping the sledge path
condition lazily evaluated and only forcing it down to a value at two
critical points in the analysis:
1. Summary creation, to avoid storing unsatisfiable pre/posts that will have
to be needlessly executed by callers. This also saves us from having
to serialise the closures involved in the uncomputed form of lazy
values inside the pulse summaries.
2. Before reporting errors we check in the state is in fact satisfiable.
If not we just prune it away at that point.
This yields ~4x speedup on some targets.
Reviewed By: ezgicicek
Differential Revision: D21129759
fbshipit-source-id: a75fdd3bc
Summary:
This is mostly just a type change for now, more changes to come. This
doesn't make thing much faster yet because we force computations pretty
often to check for unsatisfiability (each function call and PRUNE node).
Next diff will build on that.
Reviewed By: skcho
Differential Revision: D21129758
fbshipit-source-id: 72200e2b1
Summary:
When encountering a constant, pulse creates an abstract value (a
variable) to represent it, and remembers that it's equal to it. The
problem is that pulse doesn't yet know how to deal with the fact that
some variables are going to be equal to each other.
This hacks around this issue in the case of constants, within the same
procedure, by remembering which constants have been assigned to which
place-holder variables, and serving those variables again when the same
constant is translated again.
Limitation: this doesn't work across procedure calls as the "constant
maps" are not saved in summaries.
Something to look out for: we don't want to make `if (p == NULL)` create
a path where `p` is invalid (we only make null invalid when we see an
assignment from 0, i.e. `p = NULL;`).
Reviewed By: ezgicicek
Differential Revision: D21089961
fbshipit-source-id: 5ebb85d0a
Summary:
1. Package will make the error too verbose.
2. We don't even need to say it is "class" because we say it in the error
description ("Class has 0 issues and can be marked Nullsafe").
Reviewed By: artempyanykh
Differential Revision: D21131998
fbshipit-source-id: 6ccca7615
Summary:
One source of false positives on container races is when the container member field is initialised to a concurrent version in a constructor, but the static type of the field doesn't reflect the thread safety of it.
This solution
- tracks flows from constructors of safe data structures to abstract addresses;
- initialises the initial attribute state when analysing a non-constructor method to that achieved by all constructors/class-initializers.
- checks for that attribute when recording container accesses.
Reviewed By: jvillard
Differential Revision: D21089428
fbshipit-source-id: 02a88f6e8
Summary:
As artempyanykh pointed out, `nullable` works much better for required fields
than `optional` in terms of x-plat.
Reviewed By: artempyanykh
Differential Revision: D21129614
fbshipit-source-id: b03c91b78
Summary: Modeling vector iterator with two internal fields: an internal array and an internal pointer. The internal array field points to the internal array field of a vector; the internal pointer field represents the current element of the array. For now `operator++` creates a fresh element inside the array.
Reviewed By: ezgicicek
Differential Revision: D21043304
fbshipit-source-id: db3be49ce
Summary:
The java source file parser should refuse to run on non
.java file. Also, as we expect some autogenerated source files to
break Java official syntax, we catch parsing error silently and
cancel location recording for them.
Reviewed By: jvillard
Differential Revision: D21089587
fbshipit-source-id: 35f1a1e28
Summary:
Now that only races rooted at formals or globals are reported, there is no point using the ownership domain to exclude accesses to locals, so remove the code that initialises the ownership of those.
Also, remove an accidentally quadratic use of `Map.bindings |> List.find` in `FormalMap` and simplify the analysis driver.
Reviewed By: skcho
Differential Revision: D21066855
fbshipit-source-id: 126080778
Summary:
Add a path condition to each symbolic state, represented in sledge's arithmetic domain. This gives a precise account of arithmetic constraints. In particular, it is relation and thus is more robust in the face of inter-procedural analysis.
This is gated behind a flag for now as there are performance issues with the new arithmetic.
Reviewed By: jberdine
Differential Revision: D20393947
fbshipit-source-id: b780de22a
Summary:
Show that the SSA restrictions imply that the blocks can be
topologically sorted so that any use of a variable follows its
definition in the list of blocks. This showed a slight flaw in my
definition of SSA form. It used to treat program counters up to
equality. However, PCs that point to a block header contain the location
that control came from (so that the Phi instructions can be executed),
but the SSA restrictions shouldn't pay attention to that, so now the
definition of SSA introduces a equivalence on PCs that ignores that
additional information.
Reviewed By: jberdine
Differential Revision: D21066873
fbshipit-source-id: 735575a1f
Summary: Move state to summary conversion into the domain file, move a model matcher into the models file and simplify.
Reviewed By: skcho
Differential Revision: D21064351
fbshipit-source-id: bb5b07b6b
Summary:
There are two types of anonymous classes (not user defined classes):
- classic anonymous classes (defined as $<int> suffixes)
- lambda classes (corresponding to lambda expressions). Experimentally,
they all have form `$Lambda$_<int>_<int>`, but the code just uses
`$Lambda$` as a heuristic so it is potentially more robust.
# Problem this diff solves
When generate meta-issues for nullsafe, we are interested only in
user-defined classes, so we merge all nested anonymous stuff to
corresponding user-defined classes and hence aggregate the issues.
Without this diff, for each lambda in the code, we would report this as
a separate meta-issue, which would both screw up stats and be confusing
for the user (when we start reporting mode promo suggestions!).
Reviewed By: artempyanykh
Differential Revision: D21042928
fbshipit-source-id: a7be266af
Summary: In `PowLoc`, we focus on making every constructors of `PowLoc.t` return nomalized value. Thus, it is unnecessary to nomalize any `PowLoc.t` inputs.
Reviewed By: ezgicicek
Differential Revision: D21064893
fbshipit-source-id: d00a7f06b
Summary:
Introduce an explicit assumption that all of the instructions are ones
that are implemented in the translation so far, rather than just
cheating proof cases.
Reviewed By: jberdine
Differential Revision: D20891352
fbshipit-source-id: 37598bd8f
Summary:
Separate into separate files the theorems that are just about the
translation (mostly about the structure of the variable->expression
mapping that the translation builds) from theorems about the translation
and the semantics.
Also move the stuff about dominator_ordered into the SSA Theory, since
it only makes sense for SSA programs, but doesn't have anything to do
with the translation.
Reviewed By: jberdine
Differential Revision: D20673124
fbshipit-source-id: 9d8b08164
Summary:
The LLVM->LLAIR translation keeps a mapping of variables to expressions.
Previously, the invariants related to that mapping were kept in the
state relation, and so the proof needed show that they were preserved
along execution traces. This wasn't obvious as the state changes in
non-SSA ways during evaluation, but the correctness of the mappings is
heavily based on the program being in SSA form. This change separates
out the invariants, and the proof uses the final mapping that the
compiler builds, which contains all of the relevant bindings that might
be needed during execution.
Reviewed By: jberdine
Differential Revision: D20625109
fbshipit-source-id: d4c2dfe19
Summary:
This diff revises how to handle the unknown location in inferbo in two ways:
* stop appending field to the `Unknown` location, e.g. `Unknown.x.a` is evaluated to `Unknown`
* redesign the abstract of multiple locations, like `Bottom` < `Unknown` < `Known` locations
I am doing them in one diff since applying only one of them showed bad results.
Background: `Unknown` was adopted for abstracting all unknown concrete locations, so we could avoid missing semantics of assignments to unknown locations. We tried to keep soundness. However, it brought some other problems related to precision and performance.
1. Sometimes especially when Inferbo failed to reason precise pointer values, `Unknown` may point to many other abstract locations.
2. At that time, value assignments to `*Unknown` makes the situation worse: many abstract locations are updated with imprecise values.
This problem harmed not only its precision, but also its performance since it introduced more location entries in the abstract memory.
Reviewed By: jvillard
Differential Revision: D21017789
fbshipit-source-id: 0bb6bd8b5
Summary:
Treat the remainder of dividing a rational by an integer as if the
rational was an integer division.
Reviewed By: jvillard
Differential Revision: D21042515
fbshipit-source-id: b5d42ddec
Summary:
The partial treatment of Mul and Div terms can simplify some cases,
but since it is only a partial treatment that is not producing a
normal form, in other cases the "simplification" results in large and
non-canonical terms. It is safer to leave them uninterpreted.
Reviewed By: jvillard
Differential Revision: D21042521
fbshipit-source-id: 04fc37f1a
Summary:
The heights of And and Or terms can grow high. This interacts poorly
with some unoptimized Equality operations such as normalization that
do some processing at every subterm.
Reviewed By: jvillard
Differential Revision: D21042518
fbshipit-source-id: 55e6acbb1
Summary:
These are map and folding map that perform a cycle-preserving
pre-order transformation.
Reviewed By: jvillard
Differential Revision: D20877974
fbshipit-source-id: 251288228
Summary:
The current notion of "simplified" function symbols, which are treated
as a hybrid between interpreted and uninterpreted, has no logical
basis. Normalization is now strong enough, due to stronger handling of
the changing carrier set, that the "simplified" classification can be
removed.
Reviewed By: jvillard
Differential Revision: D20726961
fbshipit-source-id: 9962ea323
Summary:
It is possible for the canonical form of a term in (the carrier set
of) a relation to be a term that is not in the relation. It is also
possible for this term to be equal to a term in the relation. It is
the job of the lookup subroutine of normalization to find these
equations.
The current implementation of entails_eq is incomplete in this case,
in particular, when an uninterpreted term has an interpreted subterm,
which itself has a subterm that is not a representative.
This diff strengthens normalization (and hence it's callers such as
entails_eq) to handle such cases. This makes lookup work slightly
harder. An alternative would be to extend the carrier with new terms
from normalization to the carrier and closing the relation, and then
re-normalizing. That would lead to much inflated representation sizes,
and is inefficient.
Reviewed By: jvillard
Differential Revision: D20612565
fbshipit-source-id: 3b7534a62