Summary:
Previously, impurity analysis only collected one access for a single modification but not all other modifying accesses. This diff
- changes the impurity domain to collect all modifying accesses
- tracks and prints all the accesses seen to reach the modification, improving readability&debugging
Recording all accesses are needed in the next diff to determine if a method modifies any immutable fields. To determine that, we need to know all modifications, not just a single one.
Reviewed By: skcho
Differential Revision: D25186516
fbshipit-source-id: 43ceb3cd8
Summary: To look for captured variable address escape we should only check the validity of the addresses captured by reference. Checking the validity of the address captured by value can cause nullptr dereference false positives.
Reviewed By: jvillard
Differential Revision: D25219347
fbshipit-source-id: faf6f2b00
Summary:
This was left as a TODO before: where to place calls to destructors for
C++ temporaries that are only conditionally creating when evaluating an
expression. This can happen inside the branches of a conditional
operation `b?e:f` or in potentially-short-circuited conditions on the
righ-hand side of `&&` and `||` operators.
Following the compilation scheme of clang (observed by looking at the
generated LLVM bitcode), we instrument the program with "marker"
variables, so that for instance `X x = true?X():y;` becomes (following
the execution on the true branch):
```
marker1 = 0; // initialize all markers to 0
PRUNE(true) // entering true branch
X::X(&temporary); // create temporary...
marker1 = 1; // ...triggers setting its marker to 1
X::X(&x, &temporary); // finish expression
if (marker1) {
X::~X(&temporary); // conditionally destroy the temporary
}
```
In this diff, you'll find code for:
- associating markers to temporaries that need them
- code to initialize markers to 0 before full-expressions
- code to conditionally destroy temporaries based on the values of the
markers once the full-expression has finished evaluating
Reviewed By: da319
Differential Revision: D24954070
fbshipit-source-id: cf15df7f7
Summary:
The translation of `switch` cases needs to insert nodes around the
translation of each `case` sub-statement, so we need to force node
creation in these sub-statements so the nodes around it can be connected
to the translation of the sub-statements.
Also added more logging I found useful when debugging that.
Reviewed By: da319
Differential Revision: D24991455
fbshipit-source-id: d3a622142
Summary:
Split the translation of return more aggressively between:
1. the instruction that has to happen before the translation of the sub-expr
2. the sub-expr
3. the instruction that has to happen after the sub-expr
This is needed for the next diff which creates potentially large CFGs in
(2).
Reviewed By: da319
Differential Revision: D24954071
fbshipit-source-id: a7e7e2527
Summary: Model `folly::Optional::get_pointer` which returns an address to a value if exists or `nullptr` if empty.
Reviewed By: jvillard
Differential Revision: D24935677
fbshipit-source-id: 9d990fe07
Summary:
We deliberately stopped as soon as an error was detected when applying a
function call. This is not good as other pre/posts of the function may
apply cleanly, which would allow us to cover more behaviours of the
code.
Went on a bit of a refactoring tangeant while fixing this, to clarify
the `Ok None`/`Ok Some _`/`Error _` datatype returned by PulseInterproc.
Now we report errors as soon as we find them during function calls but
continue accumulating specs afterwards.
Reviewed By: da319
Differential Revision: D24888768
fbshipit-source-id: d5f2c29d7
Summary:
Communicate new facts from the arithmetic domain to the memory domain to
detect contradictions between the two.
Reviewed By: jberdine
Differential Revision: D24832079
fbshipit-source-id: 2caf8e9af
Summary:
This is several inter-connected changes together to keep the tests
happy.
The ConditionalOperator `b?t:e` is translated by first creating a
placeholder variable to temporarily store the result of the evaluation
in each branch, then the real thing we want to assign to reads that
variable. But, there are situations where that changes the semantics of
the expression, namely when the value created is a struct on the stack
(eg, a C++ temporary). This is because in SIL we cannot assign the
*address* of a program variable, only its contents, so by the time we're
out of the conditional operator we cannot set the struct value
correctly anymore: we can only set its content, which we did, but that
results in a "shifted" struct value that is one dereference away from
where it should be.
So a batch of changes concern `conditionalOperator_trans`:
- instead of systematically creating a temporary for the conditional,
use the `trans_state.var_exp_typ` provided from above if available
when translating `ConditionalOperator`
- don't even set anything if that variable was already initialized by
merely translating the branch expression, eg when it's a constructor
- fix long-standing TODO to propagate these initialization facts
accurately for ConditionalOperator (used by `init_expr_trans` to also
figure out if it should insert a store to the variable being
initialised or not)
The rest of the changes adapt some relevant other constructs to deal
with conditionalOperator properly now that it can set the current
variable itself, instead of storing stuff inside a temp variable. This
change was a problem because some constructs, eg a variable declaration,
will insert nodes that set up the variable before calling its
initialization, and now the initialization happens *before* that setup,
in the translation of the inner conditional operator, which naturally
creates nodes above the current one.
- add a generic helper to force a sequential order between two
translation results, forcing node creation if necessary
- use that in `init_expr_trans` and `cxxNewExpr_trans`
- adjust many places where `var_exp_typ` was incorrectly not reset when translating sub-expressions
The sequentiality business creates more nodes when used, and the
conditionalOperator business uses fewer temporary variables, so the
frontend results change quite a bit.
Note that biabduction tests were invaluable in debugging this. There
could be other constructs to adjust similarly to cxxNewExpr that were
not covered by the tests though.
Added tests in pulse that exercises the previous bug.
Reviewed By: da319
Differential Revision: D24796282
fbshipit-source-id: 0790c8d17
Summary: We recently introduced a more precise model for constructing an optional from a value by making a shallow copy. However, this introduced Use After Delete false positives. For now, we go back to a less precise model by creating a fresh value. A proper model would be to either make a deep copy or call the copy constructor for a value. We will address this in the following diff.
Reviewed By: jvillard
Differential Revision: D24826749
fbshipit-source-id: 3e5e4edeb
Summary: Refactor `folly::Optional` models to make them easier to reuse for `std::optional`
Reviewed By: jvillard
Differential Revision: D24760053
fbshipit-source-id: f665e84c8
Summary: `folly::Optional::value()` returns a reference, hence an error was shown when the actual value was being accessed. Since `value()` throws an exception in case of `folly::none`, we want to show the error message at the call site of `value()`. We do this by dereferencing the result of `value()` in the model.
Reviewed By: jvillard
Differential Revision: D24702875
fbshipit-source-id: ca9f30349
Summary:
Before we were creating a fresh internal value when we were constructing `folly::Optional`. This diff models `folly::Optional` constructor more precisely by copying the given value.
There was also a missing dereference in the model of `value_or`
Reviewed By: jvillard
Differential Revision: D24621016
fbshipit-source-id: c86d3c157
Summary: Model `folly::Optional::value_or(default)` to return value if not-empty and `default` if empty.
Reviewed By: jvillard
Differential Revision: D24539456
fbshipit-source-id: cc9e176cc
Summary:
This diff adds an option hiding function pointers in costs to users: `cost-suppress-func-ptr` is
true by default.
Reviewed By: ezgicicek
Differential Revision: D24448212
fbshipit-source-id: 88f6b5ea1
Summary:
Take another page from the Incorrectness Logic book and refrain from reporting issues on paths unless we know for sure that this path will be taken.
Previously, we would report on paths that are merely *not impossible*. This goes very far in the other direction, so it's possible we'll want to go back to some sort of middle ground. Or maybe not. See the changes in the tests to get a sense of what we're missing.
Reviewed By: ezgicicek
Differential Revision: D24014719
fbshipit-source-id: d451faf02
Summary:
Upgrade to latest clang release, needed for xcode12.
clang-8/9 won't be able to read the Xcode 12 SDK since there's annotations that will fail compilation.
Also removing unused (and hard to compile) binary `ast_exporter_bin` from facebook-clang-plugins/libtooling.
Reviewed By: ngorogiannis
Differential Revision: D23780089
fbshipit-source-id: 2314125a9
Summary:
Nullifying these leads to observable side-effects, like in the added
test.
Reviewed By: da319
Differential Revision: D23759756
fbshipit-source-id: 559a6486b
Summary: Structs captured both by reference or by value should have reference in their type. Struct captured by value should first call copy constructor. In this diff we fix the type of the captured variable to include reference. Copy constructor injection is left for the future.
Reviewed By: jvillard
Differential Revision: D23688713
fbshipit-source-id: d13748b5d
Summary: Variables captured without initialization do not have correct type inside lambda's body. This diff sets the correct type of captured reference variables inside procdesc and makes sure the translation of captured variables is correct. The translation of lambda's body will then take into account the type of captured var from procdesc.
Reviewed By: jvillard
Differential Revision: D23678371
fbshipit-source-id: ed16dc978
Summary: Add missing reference to the type of variable captured by reference without initialization.
Reviewed By: jvillard
Differential Revision: D23567685
fbshipit-source-id: b4e2ac0b6
Summary:
We were missing assignment to captured variables with initializers.
Consider the following example:
```
S* update_inside_lambda_capture_and_init(S* s) {
S* object = nullptr;
auto f = [& o = object](S* s) { o = s; };
f(s);
return object;
}
```
which was translated to
```
VARIABLE_DECLARED(o:S*&);
*&o:S*&=&object
*&f =(_fun...lambda..._operator(),([by ref]&o &o:S*&))
```
However, we want to capture `o` (which is an address of `object`), rather `&o` in closure.
After the diff
```
VARIABLE_DECLARED(o:S*&);
*&o:S*&=&object
n$7=*&o:S*&
*&f =(_fun...lambda..._operator(),([by ref]n$7 &o:S*&))
```
Reviewed By: jvillard
Differential Revision: D23567346
fbshipit-source-id: 20f77acc2
Summary:
This can be useful to make pulse forget about tricky parts of the code.
Treat "skipped" procedures as unknown so heuristics for mutating the
return value and parameters passed by reference are applied.
Reviewed By: ezgicicek
Differential Revision: D23729410
fbshipit-source-id: d7a4924a8
Summary: Added a model for copy constructor for `std::function`. In most cases, the SIL instruction `std::function::function(&dest, &src)` gives us pointers to `dest` and `src`, hence, we model the copy constructor as a shallow copy. However, in some cases, e.g. `std::function f = lambda_literal`, SIL instruction contains the closure itself `std::function::function(&dest, (operator(), captured_vars)`, hence, we need to make sure we copy the right value.
Reviewed By: ezgicicek
Differential Revision: D23396568
fbshipit-source-id: 0acb8f6bc
Summary: There was a mismatch between formals and actuals in `std::function::operator()` because we were not passing the first argument corresponding to the closure.
Reviewed By: ezgicicek
Differential Revision: D23372104
fbshipit-source-id: d0f9b27d6
Summary: When we evaluate lambdas in pulse, we create a closure object with `fake` fields to store captured variables. However, during the function call we were not linking the captured values from the closure object. We address this missing part here.
Reviewed By: jvillard
Differential Revision: D23316750
fbshipit-source-id: 14751aa58
Summary:
`delete` works exactly like `free` so merge both models together. Also
move the `free(0)` test to nullptr.cpp as it seems more appropriate.
Reviewed By: da319
Differential Revision: D23241297
fbshipit-source-id: 20a32ac54
Summary: Before we were modelling `vector.end()` as returning a fresh pointer every time is was called. It is common to check if an iterator is not the `end()` iterator and proceed to dereference the iterator in that case. In such code pattern `vector.end()` is called twice and returns different fresh values which causes false positives. To fix this, we add a special internal field `__infer_model_backing_array_pointer_to_last_element` to a vector to denote its end. Now, every time we call `vector.end()` we return the value of this field. We introduce a new attribute `EndOfCollection` to mark `end` iterator as the existing `EndIterator` invalidation is not suitable when we need to read the same value multiple times.
Reviewed By: jvillard
Differential Revision: D23101185
fbshipit-source-id: fa8a33b58
Summary:
This time it's personal.
Roll out pulse's own arithmetic domain to be fast and be able to add
precision as needed. Formulas are precise representations of the path
condition to allow for good inter-procedural precision. Reasoning on
these is somewhat ad-hoc (except for equalities, but even these aren't
quite properly saturated in general), so expect lots of holes.
Skipping dead code in the interest of readability as this (at least
temporarily) doesn't use pudge anymore. This may make a come-back as
pudge has/will have better precision: the proposed implementation of
`PulseFormula` is very cheap so can be used any time we could want to
prune paths (see following commits), but this comes at the price of some
precision. Calling into pudge at reporting time still sounds like a good
idea to reduce false positives due to infeasible paths.
#skipdeadcode
Reviewed By: skcho
Differential Revision: D22576004
fbshipit-source-id: c91793256
Summary: To avoid dead store false positives we skip initialization of a variable that has an `unused` attribute. However, this causes uninitialized value false positives when the variable is later used in macros. To fix this, instead of skipping initialization we record the information about `unused` attribute in local variable data that we can later use for filtering out dead store issues.
Reviewed By: jvillard
Differential Revision: D22868050
fbshipit-source-id: 4a2d0e680
Summary:
We get duplicated variable declaration instruction for primitive type variable initialized using list initializer, e.g.
```
int* p{nullptr};
```
This happens because we add variable declaration instruction when we translate both `DeclStmt` and `InitListExpr`. To fix this, we do not add the duplicated variable declaration when we translate `InitListExpr`.
Reviewed By: jvillard
Differential Revision: D22844726
fbshipit-source-id: 422806924
Summary: We model internal builtin `__new` function to return a non-null value. This fixes nullptr_dereference false positives where we explicitly check the result of a function call for nullptr when the function returns a newly created object.
Reviewed By: jvillard
Differential Revision: D22772217
fbshipit-source-id: 37d209697
Summary:
Add a test to the repo to try and detect perf regressions in pulse.
Currently analyzed in ~0.1s. With `--pudge`, takes ~10s.
Sledge does eager normalization and canonicalization when incorporating new facts into formula contexts and the algorithm is polynomial in the number of equalities. This example generates one equality per location in the array => boom. This bypasses the recency model of arrays because the formula needs to be constructed before it can be simplified to get rid of dead variables.
The new arithmetic is not as complete as sledge's algorithm but linear in time. We could use it to simplify the formula *before* passing it to sledge. In fact, that was the original motivation.
Reviewed By: skcho
Differential Revision: D22574366
fbshipit-source-id: e9044ae09
Summary:
When applying function summaries, we are careful not to violate the
summary's assumptions about non-aliasing. For example, the summary we
generate for `foo(x,y) { *x = *y; }` will have `x` and `y` be allocated
to two different `AbstractValue.t` in the heap, representing
disjointness.
However, the current logic is too coarse and also rejects passing the
same pure value to functions that made no assumption about them being
equal or different, eg `goo(int x,int y) { int z = x + y; }`. This is
because the corresponding `AbstractValue.t` are different in the
callee's summary, but are represented by only one same value in callers
such as `goo(i,i)`.
This diff restricts the "don't violate aliasing" condition to only
consider heap-allocated values. This is consistent with separation logic
by the way: we use the implication `x|->- * y|->- |- x≠y`, which is
valid only when both `x` and `y` are both allocated in the heap as in
the left-hand-side of `|-`.
Reviewed By: skcho
Differential Revision: D22574297
fbshipit-source-id: 206a18499
Summary: Lambda is called using `operator()`. We need to know the information of captured variables when `operator()` procedure is being analysed. This diff records lambda captured variables in `operator()` procdesc. The complication arises from the fact that procdesc for `operator()` is created before translating lambda expression or during the translation of lambda expression where captured variables are translated. To solve this issue, we update existing `operator()` procdesc attributes with captured variable information when we translate lambda expression.
Reviewed By: jvillard
Differential Revision: D22374495
fbshipit-source-id: 44909adea
Summary:
We update the type of captured variables to include information about capture mode (`ByReference` or `ByValue`) both for procdesc attributes and the closure expression.
For lambda: closure expression now contains correct capture mode for capture variables. Procdesc still does not contain information about captured variables which we will address in the next diff.
For objc blocks: at the moment all captured variables have mode `ByReference`. Added TODOs to fix this.
Reviewed By: jvillard
Differential Revision: D22572054
fbshipit-source-id: 4c88678ee
Summary:
As title
Model `NSString` as `JavaString`.
Since `NSArray` does not contain information about its type of element, we do not use associate string with collection as in Java and C++. In Java, String model is implemented using java collection, and for C++, string model is implemented using vector.
So instead, we use existing `JavaString` model.
Reviewed By: skcho
Differential Revision: D22431949
fbshipit-source-id: 7cdde1ad7
Summary: To avoid NULLPTR_DEREFERENCE false positives we want to model some functions as returning non-null. A new flag --pulse-model-return-nonnull allows us to provide a list of such functions.
Reviewed By: ezgicicek
Differential Revision: D22431564
fbshipit-source-id: 9944c7382
Summary: This diff prevents printing line numbers of loop in the trace description, which helps to keep the same descriptions even when the line number of a function is changed in tests.
Reviewed By: ezgicicek
Differential Revision: D22375584
fbshipit-source-id: 676d1a7cc
Summary:
Keyword `thread_local` in cpp allows us to create a variable with thread storage duration, meaning that the object's lifetime begins when the thread begins and ends when the thread ends.
We get `NULLPTR_DEREFERENCE` false positive for `thread_local` variable since we reallocate it in the `VariableLifetimeBegins` metadata instruction and we do not see further updates to the variable. To solve the issue we special case `VariableLifetimeBegins` instruction for global variables.
Reviewed By: jvillard
Differential Revision: D22284135
fbshipit-source-id: 13c14ef90
Summary:
We need to check if `folly::Optional` is not `folly::none` if we want to retrieve the value, otherwise a runtime exception is thrown:
```
folly::Optional<int> foo{folly::none};
return foo.value(); // bad
```
```
folly::Optional<int> foo{folly::none};
if (foo) {
return foo.value(); // ok
}
```
This diff adds a new issue type that reports if we try to access `folly::Optional` value when it is known to be `folly::none`.
Reviewed By: ezgicicek
Differential Revision: D22053352
fbshipit-source-id: 32cb00a99