Summary:
Replace horrible hack with ok hack.
The main difficulty in implementing the disjunctive domain is to avoid
the quadratic time complexity of executing the same disjuncts over and
over again when going around loops:
First time around a loop, assuming for example a single disjunct `d`:
```
[d]
loop body
[d1' \/ d2']
```
Second time around the same loop: the new pre will be the join of the
posts of predecessor nodes, so `old_pre \/ post(loop,old_pre)`, i.e.
`d \/ d1' \/ d2'`. Now we need to execute `loop body` again
*without running the symbolic execution of `d` again* (and the time after
that we'll want to not execute `d`, `d1'`, or `d2'`).
Horrible hack (before): Disjuncts have a boolean "visited" attached
that does its best to keep track of whether a given disjunct is old or
new. When executing a single *instruction* look at the flag and skip the
state if it's old. Of course we have no way to know for sure so it turns
out it was often wrongly re-executing old disjuncts. This was also
producing the wrong results over even simple loops: only the last
iteration would make it outside the loop for some reason. Overall, the
semantics were pretty untractable and shady at best.
New hack (this diff): only run instructions of a given *node* on
disjuncts that are not physically equal to the "pre" ones already in the
invariant map for the current node.
This gives the correct result over simple loops and a nice performance
improvement in general (probably the old heuristic was hitting the
quadratic bad case more often).
Reviewed By: skcho
Differential Revision: D21154063
fbshipit-source-id: 5ee38c68c
Summary:
We translated the expression `CXXStdInitializerListExpr` naively in D3058895 as a call to
a skip function, with the hope that it would be translated better in the future. However, the naive means that we lose access to the initialized list/array because we are simply skipping it. So, even if we want to model the initializer properly, we have to deal with the skip specially.
This diff tries to solve this problem by removing the skip call whenever
possible. Instead, we translate the underlying array/list as a Load, so
that when it is passed to the constructor, we can pick it up.
For the following initialization:
``` std::vector<int*> vec = {nullptr};
```
Before, we translated it as
```
*&0$?%__sil_tmpSIL_materialize_temp__n$7[0]:int* const =null
n$8=_fun___infer_skip_function(&0$?%__sil_tmpSIL_materialize_temp__n$7:int* const [1*8] const )
n$9=_fun_std::vector<int*,std::allocator<int*>>::vector(&vec:std::vector<int*,std::allocator<int*>>*,n$8:std::initializer_list<int*>)
```
However, this means, `n$8` would be result of something skipped which we can't reason about. Instead, we just pass the underlying initialized array now, so we get the following translation:
```
*&0$?%__sil_tmpSIL_materialize_temp__n$7[0]:int* const =null
n$8=*&0$?%__sil_tmpSIL_materialize_temp__n$7:int* const [1*8] const
n$9=_fun_std::vector<int*,std::allocator<int*>>::vector(&vec:std::vector<int*,std::allocator<int*>>*,n$8:std::initializer_list<int*>)
```
Reviewed By: jvillard
Differential Revision: D21155014
fbshipit-source-id: 75850b1e6
Summary:
When encountering a constant, pulse creates an abstract value (a
variable) to represent it, and remembers that it's equal to it. The
problem is that pulse doesn't yet know how to deal with the fact that
some variables are going to be equal to each other.
This hacks around this issue in the case of constants, within the same
procedure, by remembering which constants have been assigned to which
place-holder variables, and serving those variables again when the same
constant is translated again.
Limitation: this doesn't work across procedure calls as the "constant
maps" are not saved in summaries.
Something to look out for: we don't want to make `if (p == NULL)` create
a path where `p` is invalid (we only make null invalid when we see an
assignment from 0, i.e. `p = NULL;`).
Reviewed By: ezgicicek
Differential Revision: D21089961
fbshipit-source-id: 5ebb85d0a
Summary: Modeling vector iterator with two internal fields: an internal array and an internal pointer. The internal array field points to the internal array field of a vector; the internal pointer field represents the current element of the array. For now `operator++` creates a fresh element inside the array.
Reviewed By: ezgicicek
Differential Revision: D21043304
fbshipit-source-id: db3be49ce
Summary:
Add a path condition to each symbolic state, represented in sledge's arithmetic domain. This gives a precise account of arithmetic constraints. In particular, it is relation and thus is more robust in the face of inter-procedural analysis.
This is gated behind a flag for now as there are performance issues with the new arithmetic.
Reviewed By: jberdine
Differential Revision: D20393947
fbshipit-source-id: b780de22a
Summary:
This diff revises how to handle the unknown location in inferbo in two ways:
* stop appending field to the `Unknown` location, e.g. `Unknown.x.a` is evaluated to `Unknown`
* redesign the abstract of multiple locations, like `Bottom` < `Unknown` < `Known` locations
I am doing them in one diff since applying only one of them showed bad results.
Background: `Unknown` was adopted for abstracting all unknown concrete locations, so we could avoid missing semantics of assignments to unknown locations. We tried to keep soundness. However, it brought some other problems related to precision and performance.
1. Sometimes especially when Inferbo failed to reason precise pointer values, `Unknown` may point to many other abstract locations.
2. At that time, value assignments to `*Unknown` makes the situation worse: many abstract locations are updated with imprecise values.
This problem harmed not only its precision, but also its performance since it introduced more location entries in the abstract memory.
Reviewed By: jvillard
Differential Revision: D21017789
fbshipit-source-id: 0bb6bd8b5
Summary:
As soon as pulse detects an error, it completely stops the analysis and loses the state where the error occurred. This makes it difficult to debug and understand the state the program failed. Moreover, other analyses that might build on pulse (e.g. impurity), cannot access the error state.
This diff aims to restore and display the state at the time of the error in `PulseExecutionState` along with the diagnostic by extending it as follows:
```
type exec_state =
| represents the state at the program point that caused an error *)
```
As a result, since we don't immediately stop the analysis as soon as we find an error, we detect both errors in conditional branches simultaneously (see test result changes for examples).
NOTE: We need to extend `PulseOperations.access_result` to keep track of the failed state as follows:
```
type 'a access_result = ('a, Diagnostic.t * t [denoting the exit state] ) result
```
Reviewed By: jvillard
Differential Revision: D20918920
fbshipit-source-id: 432ac68d6
Summary: Consider functions that simply exit as impure by extending the impurity domain with `AbstractDomain.BooleanOr` that signifies whether the program exited.
Reviewed By: skcho
Differential Revision: D20941628
fbshipit-source-id: 19bc90e66
Summary:
This diff distinguishes array declaration and size-setting in trace. For example, when there is an
assume statement on an array size, the array size can be pruned to another value. In which case, we
want to see "Set array size" in the trace, instead of "Array declaration".
Reviewed By: jvillard
Differential Revision: D20914930
fbshipit-source-id: 0253fb69e
Summary:
This diff lifts the `PulseAbductiveDomain.t` in `PulseExecutionState` by tracking whether the program continues the analysis normally or exits unusually (e.g. by calling `exit` or `throw`):
```
type exec_state =
| ContinueProgram of PulseAbductiveDomain.t (** represents the state at the program point *)
| ExitProgram of PulseAbductiveDomain.t
(** represents the state originating at exit/divergence. *)
```
Now, Pulse's actual domain is tracked by `PulseExecutionState` and as soon as we try to analyze an instruction at `ExitProgram`, we simply return its state.
The aim is to recover the state at the time of the exit, rather than simply ignoring them (i.e. returning empty disjuncts). This allows us to get rid of some FNs that we were not able to detect before. Moreover, it also allows the impurity analysis to be more precise since we will know how the state changed up to exit.
TODO:
- Impurity analysis needs to be improved to consider functions that simply exit as impure.
- The next goal is to handle error state similarly so that when pulse finds an error, we recover the state at the error location (and potentially continue to analyze?).
Disclaimer: currently, we handle throw statements like exit (as was the case before). However, this is not correct. Ideally, control flow from throw nodes follows catch nodes rather than exiting the program entirely.
Reviewed By: jvillard
Differential Revision: D20791747
fbshipit-source-id: df9e5445a
Summary:
It's easy to create large arrays in code, eg `int x[1UL << 16];`, but
these can generate huge nodes in SIL because zero-initialization is
translated by zero-ing structures element by element. Introduce a
builtin to use instead. Keep the naive method for small structures (with
a configurable limit on "small").
Reviewed By: dulmarod
Differential Revision: D20836836
fbshipit-source-id: 6bf5410f8
Summary:
- Add `no_return` models for Java's `exit(...)` methods (can be extended further later on)
- handle throw-catch better by short-cutting throw nodes to not exit node but to all **catch nodes** that are reachable by the node. If there is no catch node, we short-cut to the exit node as before.
This removes a FP from deadstore tests because before we simply were not able to handle CF from throw-> catch nodes at all.
Reviewed By: skcho
Differential Revision: D20769039
fbshipit-source-id: e978f6cdb
Summary: In an intra-procedural analysis we assume that parameters passed by reference to a function will be initialized inside that function. We use the type information of an actual parameter to initialize the fields of the struct. This does not work if a function has a parameter of type void* as the actual parameters also has type void*. To solve this issue, we use type information from local variables.
Reviewed By: jvillard
Differential Revision: D20670253
fbshipit-source-id: dc9f051ef
Summary:
- Model `System.exit()` as early_exit and add a test
- Tweak message of methods that are impure due to having no pulse summary (and add a test)
Reviewed By: skcho
Differential Revision: D20668979
fbshipit-source-id: 6b5589aae
Summary:
Hopefully no one uses this. This is in Python and we'd like to get rid
of it. Easy enough to either re-implement if needed or to be
re-implemented by a third party.
Reviewed By: ngorogiannis
Differential Revision: D20626344
fbshipit-source-id: 484022482
Summary:
As exemplified by added tests, pulse computes an empty summary (with 0 disjuncts) whenever it discovers a contradiction which might be caused by:
- discovering aliasing in memory
- widening limited number of times in loops and concluding that loop exit conditions are never taken
However, AFAIU, it is not possible to have a function with 0 disjunct apart from such anomalities. Even a function which does nothing like `void foo(){}` has 1 disjuncts:
```
Pulse: 1 pre/post(s)
#0: PRE:
{ roots={ };
mem ={ };
attrs={ };}
POST:
{ roots={ };
mem ={ };
attrs={ };}
SKIPPED_CALLS: { }
```
The aim of this diff is to consider functions with 0 disjuncts as **impure** because most often such cases are impure, rather than actually pure.
Reviewed By: skcho
Differential Revision: D20619504
fbshipit-source-id: 3a8502c90
Summary:
Previously, at each function call, we added a `WrittenTo` attribute for applying the address of the actuals. However, this results in mistakenly considering each function application that inspects its argument as impure. Instead, we should only propagate `WrittenTo` if the actuals have already `WrittenTo` attributes.
For instance, for the following functions
```
public static boolean is_null(Byte a) {
return a == null;
}
public static boolean call_is_null(Byte a) {
return is_null(a);
}
```
We used to get the following pulse summary for `call_is_null` (showing only one of the disjuncts):
```
#0: PRE:
{ roots={ &a=v1 };
mem ={ v1 -> { * -> v2 } };
attrs={ v1 -> { MustBeValid },
v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };}
POST:
{ roots={ &a=v1, &return=v8 };
mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } };
attrs={ v2 -> { Arith =null,
BoItv ([max(0, v2), min(0, v2)]),
WrittenTo-----------WRONG },
v4 -> { Arith =1,
BoItv (1),
Invalid ConstantDereference(is the constant 1),
WrittenTo-----------WRONG },
v8 -> { WrittenTo } };}
SKIPPED_CALLS: { }
```
where we mistakenly recorded a `WrittenTo` for `v2` (what `a` points to). As a result, we considered `call_is_null` as impure :( This diff fixes that since the callee `is_null` doesn't have any `WrittenTo` attributes for its parameter `a`. So, we don't propagate `WrittenTo` and get the following summary
```
#0: PRE:
{ roots={ &a=v1 };
mem ={ v1 -> { * -> v2 } };
attrs={ v1 -> { MustBeValid },
v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };}
POST:
{ roots={ &a=v1, &return=v8 };
mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } };
attrs={ v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) },
v4 -> { Arith =1,
BoItv (1),
Invalid ConstantDereference(is the constant 1) },
v8 -> { WrittenTo } };}
SKIPPED_CALLS: { }
```
Reviewed By: skcho
Differential Revision: D20490102
fbshipit-source-id: 253d8ef64
Summary: There has never been a sufficient formal basis for soundness nor completeness of reports on locals. This diff changes the domain to effectively concern only expressions rooted at formals or globals.
Reviewed By: ezgicicek
Differential Revision: D19769201
fbshipit-source-id: 36ae04d8c
Summary:
Impurity domain was tracking all changes to variables (with a list of traces that containing all write/invalid accesses). This results in having long traces with multiple access events for the same variable. For instance,
```
void swap_impure(int[] array, int i, int j) {
int tmp = array[i];
array[i] = array[j]; \\ included in the trace
array[j] = tmp; \\ included in the trace
}
```
here we recorded both array accesses.
This diff changes the domain to include accesses so that we only keep track of a single trace per access. Array accesses are only recorded once.
Note that we want to record all unique accesses, not just the first one, because impurity will be used for hoisting/cost where we will invalidate impure arguments and consider all the rest as not changing.
Reviewed By: jvillard
Differential Revision: D20385745
fbshipit-source-id: d3647dad3
Summary:
D20362149 missed
- to pass the optional argument `include_value_history` to the recursive call in `PulseTrace.add_to_errlog`.
- to set `include_value_history=false` for skipped calls.
This diff fixes these issues.
Reviewed By: skcho
Differential Revision: D20385604
fbshipit-source-id: 176e4d010
Summary:
This was never quite finished and inferbo has a new way to do sort of
the same thing.
Reviewed By: skcho, ngorogiannis
Differential Revision: D20362619
fbshipit-source-id: 7c7935d47
Summary: Impurity traces are quite big due to recording values histories. Let's simplify the traces by removing pulse's value histories.
Reviewed By: skcho
Differential Revision: D20362149
fbshipit-source-id: 8a2a6115e
Summary:
These were not used (and were actually activated byt the same config
param). They both are in experimental stage that never reached maturity.
Since the team does not have immediate plans to work on ObjC nullability
checker; and since "eradicate" (now known as nullsafe) is the main
solution for Java, removing it is sensible.
Reviewed By: jvillard
Differential Revision: D20279866
fbshipit-source-id: 79e64992b
Summary: This diff suppresses integer overflow issues in functions that includes "hash" in its name.
Reviewed By: jvillard
Differential Revision: D19942654
fbshipit-source-id: d86fa4f00
Summary:
When finding a proper constructor for `std::make_shared`, the given parameter types are sometimes
slightly different, e.g., const int vs int. This diff loosens the condition of the types on finding
constructors.
Reviewed By: ngorogiannis
Differential Revision: D19743198
fbshipit-source-id: f90213109
Summary:
This diff fixes the clang translation for switch statement. It assumed that `default:` comes always
at last, which introduced some unreachable nodes inadvertently, e.g. when `default:` comes at first.
Reviewed By: dulmarod
Differential Revision: D19793138
fbshipit-source-id: 1e8b52c0d
Summary:
The goals are:
- Increase precision in C-languages by ditching access paths.
- Help with eventually sharing the abstract address module with RacerD.
- Reports are now language-mode specific (eg `->` in clang vs `.` in Java).
It's not exactly access expressions used here. Instead the pattern `(base, access list)` is used where `access` is `HilExp.Access.t`. This is done to ease the way `deriving` is used for creating two comparison functions, one that cares about the root variable and one that doesn't; and also because the main function that recurses over accesses (`normalise_access_list`) visits the accesses from innermost to outermost.
Also, kill some dead code.
Reviewed By: skcho
Differential Revision: D19741545
fbshipit-source-id: 013bf1a89
Summary:
This diff adds a taint domain in Inferbo. The taint value will be used to find vulnerable array
accesses in the following diffs.
Reviewed By: ezgicicek
Differential Revision: D19391028
fbshipit-source-id: 566b4c0fe
Summary:
To emulate the `ThreadSafe` contract in C++/ObjC, reporting was gated behind a check that ensured a C++/ObjC class has a `std::mutex` member (plus other filters). This is reasonable, but it has some drawbacks
- other locks may be used, and therefore must be added to the member check;
- locking mechanisms that use the object itself as a monitor cannot be modelled (`synchronized` in ObjC)
RacerD already has `ThreadsDomain` which models our guess on whether a method is expected to run in a concurrent context, and which in C++/ObjC boils down to whether the method non-transitively acquires a lock. This should be a good enough indicator that the class should be checked regardless of whether the locks are member fields. This diff gates the C++/ObjC check on that abstract property.
Reviewed By: dulmarod
Differential Revision: D19558355
fbshipit-source-id: 229d7ff82
Summary: This diff fixes the array access checking function for nested global arrays. We had assumed that RHS of `store` statement in SIL does not include array access expression, but that is not true: for global arrays, SIL can have statements like `*LHS = GlobalArray[n][m]`.
Reviewed By: ezgicicek
Differential Revision: D19300153
fbshipit-source-id: 256325642
Summary:
This diff gives semantics of `std::make_shared` as simple constructor, i.e., it changes function
call of `std::make_chared<C>(i)` to the constructor `C(i)`.
Reviewed By: ngorogiannis
Differential Revision: D19432338
fbshipit-source-id: 0d838e555
Summary:
This diff gets global constant array values from their initializers. The `find_global_array` function is
added to memory domain, which finds values of global array locations during the ondemand value
generation.
Reviewed By: ngorogiannis
Differential Revision: D19300143
fbshipit-source-id: 7b0b84c42
Summary:
If a race exists in two or more overloads of the same method and we use only the class and method name in the report text, then the current bug hashing algorithm will identify the two reports as duplicates.
To avoid this, the report had the class, method and list of type parameters. This is unreadable, however, and redundant (the report is already located within the method in question). So at the risk of duplicates, use only class+method names.
Also, fix a bug in `Procname.pp_simplified ~withclass` where `withclass` was ignored for C++/ObjC methods.
Now:
> Read/Write race. Non-private method `FrescoVitoImageSpec.onCreateInitialState(...)` indirectly reads with synchronization from `factory.AnimatedFactoryProvider.sImpl`. Potentially races with unsynchronized write in method `FrescoVitoImageSpec.onEnteredWorkingRange(...)`.@ [Litho components are required to be thread safe because of multi-threaded layout](https://fburl.com/background-layout). Reporting because current class is annotated `MountSpec`, so we assume that this method can run in parallel with other non-private methods in the class (including itself).
Before
> Read/Write race. Non-private method `void FrescoVitoImageSpec.onCreateInitialState(ComponentContext,StateValue,StateValue,Uri,MultiUri,ImageOptions,FrescoContext,Object,ImageListener)` indirectly reads with synchronization from `factory.AnimatedFactoryProvider.sImpl`. Potentially races with unsynchronized write in method `FrescoVitoImageSpec.onEnteredWorkingRange(...)`.@ [Litho components are required to be thread safe because of multi-threaded layout](https://fburl.com/background-layout). Reporting because current class is annotated `MountSpec`, so we assume that this method can run in parallel with other non-private methods in the class (including itself).
Reviewed By: artempyanykh
Differential Revision: D19462277
fbshipit-source-id: aebc20d89
Summary:
Currently, impurity analysis is oblivious to skipped functions which might e.g. return a non-deterministic value, write to memory or have some other side-effect. This diff fixes that by relying on Pulse's skipped functions to determine impurity. Any unknown function which is not modeled to be pure is assumed to be impure.
This is a heuristic. We could have assumed them to be pure by default as well.
Reviewed By: jvillard
Differential Revision: D19428514
fbshipit-source-id: 82efe04f9
Summary: This diff captures global initializers ondemand, like we do for functions defined in headers.
Reviewed By: ezgicicek
Differential Revision: D19346947
fbshipit-source-id: 05174e6a4
Summary:
Demonstrate that the per-file type environments don't prevent
the deadlock report here. The fear was that when the analyser
tries to locate the methods of the endpoint class, it might fail to
do so because the types might be stored in different type
environments (per file).
Reviewed By: mityal
Differential Revision: D19225908
fbshipit-source-id: 097e4aeea
Summary:
A plus is a plus, no need to give up when +/- is about pointers. This
gets rid of some false positives involving pointer arithmetic.
However, the problem remains if we make things a bit more
inter-procedural. This is documented in an added test.
Reviewed By: ezgicicek
Differential Revision: D18932877
fbshipit-source-id: 4ad1cfe72
Summary:
The `Typ.FIeldname` module has many issues. Among those:
- It has 5 different string/printing functions and most of them do radically different things in Java and in Clang.
- There is no type safety: creating a Clang field and calling a Java function on it will lead to a crash (`rindex_exn` etc, there are usually no dots in Clang fields).
- It uses a single string for Java fields, containing the package, the class and the field, e.g., `java.lang.Object.field`. This is wasteful, because
- there is no sharing of strings for packages/classes, and,
- string operations need to be performed every time we need the field or the class or the package alone.
This diff preserves the behaviour of the module's interface, so the API problems remain.
However, by using a saner representation for Java fields we can get small performance and large memory gains (the type environment in Java is much smaller, about 30-40%).
In addition, many functions on clang fields would previously do string manipulations (look for `.` and split on it) before returning the final field unchanged -- now they use the type of the field for that.
Reviewed By: jvillard
Differential Revision: D18908864
fbshipit-source-id: a72d847cc