Summary:
Races in Nullsafe classes can undermine NPE safety despite the class passing the type checks.
This diff adds to the report text of THREAD_SAFETY_VIOLATION and GUARDEDBY_VIOLATION the following trailer:
> Data races in `Nullsafe` classes may still cause NPEs.
This only happens if the race is directly on a non-primitively-typed member field of the class.
It also uses distinct bug types (adds the suffix _NULLSAFE to the bug types above) for easier accounting.
Reviewed By: ezgicicek
Differential Revision: D26403274
fbshipit-source-id: 3cd6ca082
Summary:
Dear Infer team,
To contribute to Infer community, I would like to integrate infer#'s language agnostic layer into Infer.
Please help to review, discuss and consider to merge this feature.
Thanks,
Xiaoyu
Pull Request resolved: https://github.com/facebook/infer/pull/1361
Reviewed By: skcho
Differential Revision: D25928458
Pulled By: jvillard
fbshipit-source-id: 7726150b8
Summary:
When a union type has a member function in C++, it is parsed as `CppClass`. However, sometimes we may want
to distinguish normal cpp classes and union classes. This diff adds a field to the type name.
Reviewed By: jvillard
Differential Revision: D26125619
fbshipit-source-id: 44a6e8192
Summary:
This is all dead code but I had to do this to try something else and I
don't want to have to do that again :)
Reviewed By: skcho
Differential Revision: D26022111
fbshipit-source-id: 622ca10b9
Summary:
When C and C++ code handle a common struct typed value, the struct
type is handled as a `CStruct` in the C code, but as a `CppClass` in
the C++ code. On the other hand, `Fieldname.t` contains a string of
field and **the struct type**. As a result, even if a same field is
accessed in C and C++ code, the accessed fieldnames are different.
```
void callee_in_c(struct s* x) {
x->a = 3;
}
void caller_in_cpp() {
struct s x;
x.a = 5;
callee_in_c(&x);
// HERE
}
```
For example, in the above code, `caller_in_cpp` sets the field `a` as
5, then calls `callee_in_c`, which sets the field `a` as 3. However,
at `HERE`, the value of `x` in Pulse is `{a -> 5, a -> 3}`, because the two
fieldnames are addressed as different ones.
To avoid the issue, this diff loosens the fieldname comparison in
Pulse.
Reviewed By: jvillard
Differential Revision: D26000812
fbshipit-source-id: 77142ebda
Summary:
When accessing a field or array offset of a pure variable (`Exp.Var`) that does not resolve to an access expression, `HilExp.of_sil` will create an extraneous dereference that causes `HilExp.get_typ` to fail. This pull request wraps variables that are the bases of Lfield or Lindex expressions with AddressOf before they're dereferenced (this is already done for Lvar inside `AccessExpression.of_pvar`) and adds a couple of unit tests that make sure it behaves as expected.
**More details on the bug:**
Given the following code:
```
if (!event_obj->dict)
```
and SIL:
```
n$6=_fun_gdb::ref_ptr<event_object,gdbpy_ref_policy<event_object>>::operator->(&event_obj:gdb::ref_ptr<event_object,gdbpy_ref_policy<event_object>>&) [line 38, column 8];
n$7=*n$6.dict:_object* [line 38, column 8];
PRUNE(!n$7, true); [line 38, column 8];
```
`operator->` has return type `event_object*`, but `n$6.dict` only has access to the type of the struct, `event_object`. `of_sil` [calls](9f98368e49/infer/src/absint/HilExp.ml (L567)) `access_expr_of_lhs_exp` with that type, which [calls](9f98368e49/infer/src/absint/HilExp.ml (L498)) `access_exprs_of_exp` (note that `add_deref` is always true). The Lfield case will then [recurse](9f98368e49/infer/src/absint/HilExp.ml (L469)) to process the Exp.Var, and `AccessExpression.of_id` will return an `AccessPath.base` that is then [dereferenced](9f98368e49/infer/src/absint/HilExp.ml (L440)). When resolving types, `get_typ` will find a non-pointer type wrapped by a `Dereference` and return [None](9f98368e49/infer/src/absint/HilExp.ml (L286)). To fix this, we match what [of_pvar](9f98368e49/infer/src/absint/HilExp.ml (L295)) does and wrap the base in an AddressOf, which is removed by the dereference.
Pull Request resolved: https://github.com/facebook/infer/pull/1372
Reviewed By: ngorogiannis
Differential Revision: D25803049
Pulled By: jvillard
fbshipit-source-id: ceadc8cad
Summary:
In `Config`, the lists generated by `mk_string_list`, `mk_path_list`, `mk_rest_actions` are reversed implicitly, which made it hard for developers to use them correctly. What the previous and this diff will do is to change the list variables of the `Config` to not-reversed one.
* diff1: First diff adds `RevList` to distinguish reversed lists explicitly. All usages of the reversed list should be changed to use `RevList`'s lib calls.
* diff2: Then this diff will change types of `Config` variables to not-reversed, normal list.
Reviewed By: ngorogiannis
Differential Revision: D25562303
fbshipit-source-id: 4cbc6d234
Summary:
In `Config`, the lists generated by `mk_string_list`, `mk_path_list`, `mk_rest_actions` are reversed implicitly, which made it hard for developers to use them correctly. What this and the next diff will do is to change the list variables of the `Config` to not-reversed one.
* diff1: First this diff adds `RevList` to distinguish reversed lists explicitly. All usages of the reversed list should be changed to use `RevList`'s lib calls.
* diff2: Then the next diff will change types of `Config` variables to not-reversed, normal list.
Reviewed By: ngorogiannis
Differential Revision: D25562297
fbshipit-source-id: b96622336
Summary: The Ondemand entry point `analyze_proc_desc` exists purely to support specialisation under biabduction. After fixing the storing of specialised `proc_desc`s for java it suffices to use `analyze_proc_name` which will work just fine in its place.
Reviewed By: jvillard
Differential Revision: D25421763
fbshipit-source-id: b162feec3
Summary:
- Eliminate dead `source` field in `field_data` record.
- Use a 2-level hashtable structure (`procname` -> `source_file` and `source_file` -> `file_data`) to help potentially using an LRU cache for layer 1.
- Use laziness instead of mutability.
- Fixed stale comments
Reviewed By: jvillard
Differential Revision: D25303360
fbshipit-source-id: 68a22d299
Summary:
This diff adds a new issue type for reporting modifications to immutable fields (when `report-immutable-modifications` is enabled).
The underlying analysis depends on impurity analysis which itself is based on post-processing of pulse's summaries.
Reviewed By: skcho
Differential Revision: D25216637
fbshipit-source-id: 42e843793
Summary:
Previously, impurity analysis only collected one access for a single modification but not all other modifying accesses. This diff
- changes the impurity domain to collect all modifying accesses
- tracks and prints all the accesses seen to reach the modification, improving readability&debugging
Recording all accesses are needed in the next diff to determine if a method modifies any immutable fields. To determine that, we need to know all modifications, not just a single one.
Reviewed By: skcho
Differential Revision: D25186516
fbshipit-source-id: 43ceb3cd8
Summary: There are no users of this, and it stands in the way of refactoring.
Reviewed By: ezgicicek
Differential Revision: D25241940
fbshipit-source-id: 5e653341a
Summary:
The starvation domain keeps a domain element per distinct pair of lock object and source location. This was used to counteract the imprecision of implicit Quandary-style traces. Starvation has used explicit traces for a long time now, so keeping all these elements is expensive (in fact, in some cases exponential) and of no value. Now, lock object identity is the only distinguishing feature of a domain element.
Also, fix some pretty printing for debugging purposes.
Reviewed By: jvillard
Differential Revision: D24829306
fbshipit-source-id: 22e12f9c1
Summary:
Since we report issues types without a prefix (e.g. ERADICATE_) and
with spaces we should also allow for prefix-less issue types in
SuppressLint, so both should work
- SuppressLint("eradicate-field-not-initialized")
- SuppressLint("Field Not Initialized")
Reviewed By: jvillard
Differential Revision: D24760341
fbshipit-source-id: 1590cf6d0
Summary:
The problem in Reporting.ml:log_issue_from_summary is that it merely
checks the presence of `SuppressLint` annotation on method's body to
decide whether to log or not the issue. This means that regardless of
issue types specified in `SuppressLint`, all issues on such method will
get blocked.
Here we fix that.
Reviewed By: ngorogiannis, mityal
Differential Revision: D24726604
fbshipit-source-id: c9cae3833
Summary:
This diff glues the previous work together.
The ClassLevelAnalysis finds list of provisional violation, builds the
graph based on them, and outputs this graph as a separate issue.
Reviewed By: artempyanykh
Differential Revision: D24682802
fbshipit-source-id: 8174da91a
Summary: Cleanup `Typ` by moving all constant types to `StdTyp`. Also remove `Typ.typ` as it's just `Typ.t` now.
Reviewed By: jberdine
Differential Revision: D24620397
fbshipit-source-id: 4764f87ef
Summary:
Namely `enumerateObjectsUsingBlock` which takes a collection and a block as an arg and the iterates over the collection and applies the block to each element of the collection.
This is a common way to iterate over Objc collections, so let's add a model for it.
Reviewed By: skcho
Differential Revision: D24604590
fbshipit-source-id: 1ceeb4b40
Summary:
Emit the crucial parts of Pulse summaries as json to enable
post-processing by external tools. Stop somewhat arbitrarily at some
datatypes that are just emitted as "opaque" values.
For example:
```
$ infer debug --procedures --procedures-summary-json --select 0
[[["pulse",[["ContinueProgram",{"post":{"heap":[["v3",[[["Dereference"],["v4","_"]]]],["v7",[[["Dereference"],["v3","_"]]]]],"stack":[[["ProgramVar",{"plain":"return","mangled":null}],["v7","_"]]],"attrs":"_"},"pre":{"heap":[],"stack":[],"attrs":"_"},"skipped_calls":"_","path_condition":"_"}],["ContinueProgram",{"post":{"heap":[["v3",[[["Dereference"],["v4","_"]]]],["v8",[[["Dereference"],["v3","_"]]]]],"stack":[[["ProgramVar",{"plain":"return","mangled":null}],["v8","_"]]],"attrs":"_"},"pre":{"heap":[],"stack":[],"attrs":"_"},"skipped_calls":"_","path_condition":"_"}]]]]]
```
Reviewed By: ezgicicek
Differential Revision: D24503387
fbshipit-source-id: 9bd08e93b
Summary:
In ObjC, `NSObject.copy` returns the object returned by `copyWithZone:` on the given class. This method must be implemented if the class complies with `NSCopying` protocol. Since we don't have access to `NSObject`'s code, to follow calls into `copyWithZone:`, we replace such `copy` calls with calls to `copyWithZone:` when a) such a method exists in the class and b) the class conforms to `NSCopying` protocol.
This is done in the preanalysis because
- we need to know if there is a `copyWithZone:` method in the class.
- so that other analyses also benefit (as opposed to doing this in cost and inferbo models).
Note that `NSObject` doesn't itself conform to `NSCopying` but all its subclasses must confrom to the protocol and support the same behavior as above.
https://developer.apple.com/documentation/objectivec/nsobject/1418807-copy
Similarly for `mutableCopy` -> `mutableCopyWithZone:` for classes implementing `NSMutableCopying` protocol.
Reviewed By: skcho
Differential Revision: D24218102
fbshipit-source-id: 42900760e
Summary: We add a naive model for `forEach` idiom for Java's Iterable and Maps. The model is naive because it doesn't take the cost of the lambda into account. This will be fixed later.
Reviewed By: da319
Differential Revision: D23868203
fbshipit-source-id: 37d169c6f
Summary: Subtle false positives and negatives in Hil make Sil preferable. This diff gets rid of the CFG-emulation of Hil, while still using Hil expressions.
Reviewed By: da319
Differential Revision: D23815026
fbshipit-source-id: 731a6d299
Summary:
This diff add a model of `NSArray.index_of_object_passing_test` in
autoreleasepool size model.
Worst case autorelease cost is "array size x cost of given test function (lambda)".
Reviewed By: ezgicicek
Differential Revision: D23785165
fbshipit-source-id: 95ec9700a
Summary:
This can be useful to make pulse forget about tricky parts of the code.
Treat "skipped" procedures as unknown so heuristics for mutating the
return value and parameters passed by reference are applied.
Reviewed By: ezgicicek
Differential Revision: D23729410
fbshipit-source-id: d7a4924a8
Summary:
Eliminate the need to serialise procnames when sending work from the restart scheduler to the workers, by sending the proc_uid instead. This is (much) shorter than the byte representation of the proc_name and it's the primary DB key of the procedures table, so it can be used by the worker to obtain the full procname.
Also, reduce GC churn by using folds in the scheduler startup instead of copying lists over and over.
Reviewed By: jberdine
Differential Revision: D23566131
fbshipit-source-id: 1472aa990
Summary:
Limit communication bandwidth and serialisation burden by sending procedure filename strings (which are bounded at ~100 bytes) instead of serialising procnames through the socket to the scheduler (which are unbounded and have been seen to reach ~30kB in the worst case for templated procedures).
Context:
Under the restart scheduler, a worker working on a procedure X that discovers a race on a dependency Y it needs fails the computation of X and sends to the scheduler the procname Y. The next time X is about to be rescheduled, the scheduler checks whether Y is still being analysed, by checking if the lock for Y still exists. This check uses the procedure filename already, so we can send that instead.
Reviewed By: jvillard
Differential Revision: D23554995
fbshipit-source-id: 9828e71a2
Summary: Most of the time, when the procdesc of a callee is requested, all that is really required is the procedure attributes. However, requesting the procdesc may return `None` when the procedure is undefined (in Java, and soon for Clang too). So, change all callsites to using attributes instead, where possible.
Reviewed By: jvillard
Differential Revision: D23539422
fbshipit-source-id: 3b1a52d48
Summary:
This is needed to make dune auto-updating of unit tests introduced in
the next diff cohabit peacefully with our tests to make sure code stays
correctly formatted wrt ocamlformat.
Also, more auto-formatting = better.
Reviewed By: da319
Differential Revision: D22865004
fbshipit-source-id: 91c47ab08
Summary:
Note that the current implementation lists classes and methods (not
<class:method> pairs.
This is bit clowny, but works for now (given the list of classes is small).
So I'll leave it out of the scope of this diff.
Reviewed By: ngorogiannis
Differential Revision: D23076671
fbshipit-source-id: 71c94ebd9
Summary:
Use `Typ.t` for parameter types in procnames, instead of `JavaSplitName.t`. This allows more precise and usable types stored in procnames, as well as not using the string-based representation of `JavaSplitName` which meant parsing strings whenever one needed a `Typ.t`.
This also makes `JavaSplitName` dead code, so it is removed.
Reviewed By: skcho
Differential Revision: D20500423
fbshipit-source-id: a72728e3f
Summary:
We check for supertypes in Java. Why not ObjC?
Would be good to get dulmarod's input here.
Reviewed By: roro47
Differential Revision: D22817126
fbshipit-source-id: 52c1c3f3c
Summary: There used to be `JoinAfter n` mode where we would try to join `n` states instead of always making disjunctions. It got deleted in D14258485 and Pulse's underlying (pre-disjuncts) domain doesn't even have a join operation. `NeverJoin` mode is not useful in Pulse anymore: pulse will diverge or OOM if we don't limit the number of disjuncts. It is also not used by any other analyzer. Let's remove it.
Reviewed By: jvillard
Differential Revision: D22817425
fbshipit-source-id: 1e658f11d
Summary: This diff refactors Java specific `PatternMatch` functions into its own module. When `PatternMatch.ml` was originally created, it was mainly for Java but now it also supports ObjC. Let's refactor it to reflect the Java/ObjC separation: move all functions that operate on Java procnames into Java submodule.
Reviewed By: jvillard
Differential Revision: D22816504
fbshipit-source-id: ff6b64b29
Summary:
This avoids wasting potentially large amount of work in some
pathological situations. Suppose `foo()` has D specs/disjuncts in its
Pulse summary, and consider a node that calls `foo()` N times, starting
with just one disjunct:
```
[x]
foo()
[x1, ..., xD]
foo()
[y1, ..., yD^2]
foo()
...
```
At the end we get `D^N` disjuncts. Except, currently (before this diff),
we prune the number of disjuncts to the limit L at each step, so really
what happens is that we (very) quickly reach the L limit, then perform
`L*D` work at each step (where we take "apply one pre/post pair to the
current state" to be one unit of work), thus doing `O(L*D*n)` amount of
work.
Instead, this diff counts how many disjuncts we get after each
instruction is executed, and we already reched the limit L then doesn't
bother accumulating more disjuncts (as they'd get discarded any way),
and crucially also doesn't bother executing the current instruction on
these extra disjuncts. This means we only do `O(L*n)` work now, which is
exactly how we expect pulse to scale: execute each instruction (in
loop-free code) at most L times.
This helps with new arithmetic domains that are more expensive and
exhibit huge slowdowns without this diff but no slowdown with this diff
(at least on a few examples).
Reviewed By: skcho
Differential Revision: D22815241
fbshipit-source-id: ce9928e7c
Summary:
We update the type of captured variables to include information about capture mode (`ByReference` or `ByValue`) both for procdesc attributes and the closure expression.
For lambda: closure expression now contains correct capture mode for capture variables. Procdesc still does not contain information about captured variables which we will address in the next diff.
For objc blocks: at the moment all captured variables have mode `ByReference`. Added TODOs to fix this.
Reviewed By: jvillard
Differential Revision: D22572054
fbshipit-source-id: 4c88678ee
Summary:
This diff adds a prototype of a new checker that collects config checkings between markers.
Basically, what the checker is doing is a taint analysis.
* Taint source: function calls of "marker start"
* Taint sink: function calls of config checking
* Taint remove: function calls of "marker end"
By the taint analysis, the analysis results will say that which taints can reach to the sink. In other words, which marker ID that has been started can reach to the config checks, before marker's ending.
I am separating the diff into three steps:
(1/3) Add basic abstract semantics
(2/3) Add trace information
(3/3) Add reporting with test examples
Reviewed By: jvillard
Differential Revision: D21935546
fbshipit-source-id: 092abb92c
Summary: Pulse has now a better version of this check, so let's delete it.
Reviewed By: ngorogiannis
Differential Revision: D22019247
fbshipit-source-id: 344678225