Summary:
We were not tracking static `create ` and `build` methods intra-procedurally in the analysis and we just looked up their summaries. This resulted in not having static `create` methods in the trace. Let's fix that by pushing static check into `satisfy_heuristic`.
This results in a few FPs, but such cases are not very common in the codebase.
Reviewed By: skcho
Differential Revision: D18451955
fbshipit-source-id: d4ccf46f5
Summary:
We consider Java collections to be like c++ std::vectors and add models for
- `Collections.get(..)`
- `__cast`
Reviewed By: skcho
Differential Revision: D18449607
fbshipit-source-id: 448206c84
Summary: `equals1` and `equals2` in `SafeInvertedMap.join` are references that indicate whether given parameters and the result is physically equal or not. This diff fixes a missing update of them.
Reviewed By: ezgicicek
Differential Revision: D18450680
fbshipit-source-id: bae19cbe9
Summary:
`getThis` is an idiom for allowing Builder sub-classes to jump through the hoops of covariance plus java generics with self types. It's declared as abstract in the (generic) inner `Builder` class of a root class, and subclasses declare generic `Builder`s that inherit from the generic root `Builder` and trivially implement this method by returning `this`. Obviously, this returns conditional ownership (if `this` is owned, then the return value is owned).
The way it's typically used is
```
T foo() {
...
return getThis();
}
```
However, because abstract methods need dynamic dispatch for proper summarisation, we miss all that. A workaround was been implemented in D8947992 (see that for context), but it was buggy -- it required that the LHS type in the assignment
```
lhs = this.getThis();
```
is the same as the type of `this`, but this is too strict (eg, when using casts).
Here, the condition is changed to requiring that the return type of the method is the same as the type of `this`.
We also avoid asking for the `procdesc` as everything needed is in the attributes.
Reviewed By: jberdine
Differential Revision: D18450737
fbshipit-source-id: e67f0495c
Summary:
It returns non-top value when one of the parameters of band is positive, i.e., `x & 255` returns
`[0, 255]` instead of top.
Reviewed By: ezgicicek
Differential Revision: D18448614
fbshipit-source-id: aaa298a66
Summary:
Let's introduce a set of new cost analysis issue types that are raised when the function is statically determined to run on the UI thread. For this, we rely on the existing `runs_on_ui_thread` check that is developed for RacerD. We also update the cost summary and `jsonbug.cost_item` to include whether a method is on the ui thread so that we don't repeatedly compute this at diff time for complexity increase issues.
Note that `*_UI_THREAD` cost issues are assumed to be more strict than `*_COLD_START` reports at the moment. Next, we can also consider adding a new issue type that combines both such as `*_UI_THREAD_AND_COLD_START` (i.e. for methods that are both on cold start and run on ui thread).
Reviewed By: ngorogiannis
Differential Revision: D18428408
fbshipit-source-id: f18805716
Summary:
This diff tries more narrowing during analysis in order to get preciser results on nested loops.
In the widening phase, it does narrowing a loop right after its widening, for each loops. In general, this may make the widening phase non-terminating because it keeps the abstract state from monotonely increasing to the fixed point in a finite number of iterations. To avoid that situation, this diff applies the narrowing only when the first visit of the loop in the widening phase.
Reviewed By: ngorogiannis
Differential Revision: D18400631
fbshipit-source-id: cc76f7e85
Summary: Sometimes there is a code like `for(int i = 1; i < x; i++){ l.add(); }`, where the first element in a list is addressed specifically. This case was not analyzed precisely, because the alias value is added only when `i` is initialized by 0 by heuristic. This diff extends the heuristic, so it adds a size alias between `i` and `l.size()` when `i` is initialized by 0 or 1.
Reviewed By: ezgicicek
Differential Revision: D18351867
fbshipit-source-id: e7d19a4ec
Summary:
This diff adds semantics of Java function calls of enum `values` inside class initializers.
* Java class initializer function initializes a specific field `$VALUES`, which points to the list
of enum values.
* The `values` function of enum class returns the value of `$VALUES`.
The problem is when the `values` function is called inside the class initializer, for example:
```
enum Color {
RED,
GREEN,
BLUE;
static {
for (Color c : Color.values()) {}
}
}
```
This introduces a recursive dependency: the class initializer calls `Color.values` and the function
returns `Color.$VALUES` the value of which should be initialized in the class initializer.
To address the problem, this diff finds the value of `$VALUES` in its abstract memory when
`values` is called inside the class initializer.
Reviewed By: ezgicicek
Differential Revision: D18349281
fbshipit-source-id: 21766c20f
Summary:
The order was wrong: the map from procnames to node-ids was cleaned first, but to clean the map from node-ids to nodes, we need the id and we have already removed it.
The symptom was that effectively no new leaves are created by "removing" a node, so the call graph scheduler quickly devolves to the file-based one.
Reviewed By: skcho
Differential Revision: D18448209
fbshipit-source-id: f272a8112
Summary:
There was some over-general treatment of reachability, in anticipation of changes that didn't happen.
In particular, we only need to flag/remove single nodes, as they must be leaves to be scheduled,
therefore we never need to traverse their successors, because there aren't any.
Reviewed By: jvillard
Differential Revision: D18425905
fbshipit-source-id: b86490542
Summary:
- Convert `task_generator` into a module of `ProcessPool` and collect inside the two combinators which were in semi-random places.
- Make `SyntacticCallGraph` export a `task_generator` as opposed to a call-graph builder.
- Separate `target` type and put it in its own module to avoid dependency cycles.
Reviewed By: skcho
Differential Revision: D18425718
fbshipit-source-id: 7957edac8
Summary:
`typecheck_instr` is a function that essentially does most of nullsafe typechecking
work: it pattern matches the instruction and handles most of business logic (e.g. knowledge about builtins, rules of PropagatesNullable, flow-sensitive rule for condition branches etc etc etc.
This is a gigantic function consisting of huge amount of nested
function. No way it can fit in someone's head and it is super hard to
read. It also captures many things like node and node'.
This diff does fairly shallow work:
1. Moves most of nested subfunctions into top-level functions. It also
means that what was previously hiddenly captured is now made an explicit
dependency.
2. Some (minority) of them are renamed to better indicate the context.
3. Some (minority) of params are renamed to clear indicate the
difference.
4. Some params are made named params, especially booleans and params
that have the same types so could be confused.
5. In some placed, added bit more clear comments.
This diff DOES NOT do:
1. Tries to fully resolve confusion between cryptic names. In couple of places, instead of node'
and node we use node and original_node, which is barely better. But we
still have ugly things like typestate and typestate1 leaving together.
Baby steps!
2. Tries to create better or consistent names in functions. I did it for
most obvious cases, that's it.
3. Order or params is arbitrary. I just used the order that was easy to
come up with, no consistency here.
Reviewed By: artempyanykh
Differential Revision: D18425965
fbshipit-source-id: 7a775f08d
Summary:
This diff extends bound domain to express Min/Max of another bounds, so it can keep some more
precision in `Math.min/max`.
limitation: `MinMaxB`, the constructor of the bound, can contain only linear expressions or
previous min/max expressions.
Reviewed By: ezgicicek
Differential Revision: D18395365
fbshipit-source-id: fc90d27fd
Summary:
We still have few remaining places which we have a logic not encoded in
as a Nullability type. Ignoring third party param calls is one of them.
We don't want this behavior for strict mode.
Reviewed By: artempyanykh
Differential Revision: D18396639
fbshipit-source-id: dbfedc769
Summary:
The upcoming whole-program analysis will need to log reports in different source files simultaneously.
Here, the data structure containing the reports, and in charge of deduplication is generalised to
a map from source file to what it was previously. This way, writing out the reports is possible even
with multiple source files.
Reviewed By: skcho
Differential Revision: D18394994
fbshipit-source-id: 5f5ecc27c
Summary:
This is a dirty hack to make nullsafe behave consistently for internal
and external models.
Medium term this code needs to be rewritten in a better way, so that we
pass all information in annotated signature (either in form of Inferred
nullability, or type origin, which tracks third party calls).
As a follow up, we will make reporting more clear.
Reviewed By: artempyanykh
Differential Revision: D18372660
fbshipit-source-id: 12c2449e1
Summary: Capture locations where work is scheduled to run in parallel (here, just Executors). Also add a test file with cases the upcoming whole-program analysis for starvation should catch.
Reviewed By: dulmarod
Differential Revision: D18346880
fbshipit-source-id: 57411b052
Summary:
Ability to accept relative path is convenient for testing and local
debugging.
Ability to accept absolute paths is needed for buck integration - buck
knows absolute path (to properly invalidate cache when the content of
the folder is changed) and passes it to infer.
Reviewed By: artempyanykh
Differential Revision: D18370407
fbshipit-source-id: be7f12ae1
Summary: Follow ups will include error messaging that makes the choice clear
Reviewed By: artempyanykh
Differential Revision: D18347664
fbshipit-source-id: b6f005726
Summary:
In this diff, we just load the info from the storage. Next diff will be
actually using this information to infer nullability.
`ThirdPartyAnnotationGlobalRepo.get_repo` will be used in the next diff,
hence #skipdeadcode
Reviewed By: artempyanykh
Differential Revision: D18347647
fbshipit-source-id: 82a9223c6
Summary:
In next diffs, we will:
1/ teach nullsafe to read nullability information from the 3rd party
annotation folder
2/ use this storage in addition to our hard-coded
models to respect nullability during type-checking
Reviewed By: artempyanykh
Differential Revision: D18247538
fbshipit-source-id: ee45bc80e
Summary:
This diff extends the alias domain, so each variable can have multiple aliases.
It changed `KeyLhs` can be mapped to multiple alias targets in the `AliasMap` domain:
```
before : KeyLhs.t -> KeyRhs.t * AliasTarget.t
after : KeyLhs.t -> KeyRhs.t -> AliasTarget.t
```
Reviewed By: ezgicicek
Differential Revision: D18062178
fbshipit-source-id: b325a6055
Summary:
It extracts RHS of alias from `AliasTarget.t`, so it changes the `AliasMap` domain:
```
before : KeyLhs.t -> AliasTarget.t // AliasTarget.t includes KeyRhs.t
after : KeyLhs.t -> KeyRhs.t * AliasTarget.t
```
Reviewed By: ezgicicek
Differential Revision: D18299537
fbshipit-source-id: 1446580a8
Summary: We only care about expensive invariant calls. Let's disable the usual one since it is not used and there is no point filling up our dbs.
Reviewed By: dulmarod
Differential Revision: D18298530
fbshipit-source-id: 7a933c8da
Summary:
More rigid format for 3rd party annotation repo
To simplify external processing by tools, lets not accept any divergence from a
format, which is:
<fully qualified class name>#<method_name>(<params>)[ Nullable]
<params> = <list of param separated by ", ">
<param> = [Nullable ]<fully qualified type>
Particularly:
- Dont accept spaces in arbitrary places
- Require a single space after each comma separating params
- Require a single space between closing parethesis and nullable
annotation, if present
- Require at least one '.' in a class name (to prevent errors when
somebody did not specify a package).
This diff also makes error messaging more user-friendly, because now it
is easy to do a minor mistake like forget a space after comma.
Reviewed By: asm89
Differential Revision: D18297382
fbshipit-source-id: 91aab6823
Summary:
This diff tries to make less imprecise division by constants results.
For example, the results of the division `[l, u] / c`, where `c` is a positive constant, are:
1. If `l/c` or `u/c` is representable in the bound domain, it uses the precise bounds, i.e., `[l/c, u/c]`.
2. If it is not representable, it tries to make conservative results:
if `0<=l<=u`, it returns `[0, u]` because `0 <= [l/c, u/c] <= u`
if `l<=u<=0`, it returns `[l, 0]` because `l <= [l/c, u/c] <= 0`
if `l<=0<=u`, it returns `[l, u]` because `l <= [l/c, u/c] <= u`
3. otherwise, it returns top, `[-oo, +oo]`
Reviewed By: ezgicicek
Differential Revision: D18270380
fbshipit-source-id: 8fb14c0e4
Summary:
Add precision to analysis by elaborating the thread-status domain. This is done by having unknown (bottom), UI, BG or Any (both/top) elements in the lattice. This way, when we branch on thread-identity (if I am on UI thread do this, otherwise do that), we know that in one branch we are on UI thread and on the other we are *not* on the UI thread (BG thread), where previously the other branch would just go to top.
With this knowledge we can throw away pairs that come from callees which run on a thread that is impossible, given the current caller thread identity. This can happen when annotations are used incorrectly, and since this is the purview of annot-reachability, we just drop those pairs entirely.
Reviewed By: skcho
Differential Revision: D18202175
fbshipit-source-id: be604054e
Summary:
Now that we have two similar functions, it becomes confusing, because `Pp.to_string` and `Pp.string_of_pp` can seem to do the same stuff, while in reality they do the opposite.
Well, it is still bit confusing, because the proper names would be
`Pp.pp_of_to_string` and `Pp.to_string_of_pp`, but I think this high
level order names are not necessary given that in most cases they will
be used as concrete functions.
I think `Pp.of_string` captures such usages better than `to_string` used to do: you need to pp stuff,
but you have a string (or, technically, a function that returns a string), so you pretty print OF that string, aren't you?
Reviewed By: jvillard
Differential Revision: D18245876
fbshipit-source-id: fd4b6ab68
Summary:
This is a helper module for reading info from a 3rd party nullability repository.
Next diffs are going to use it for reading nullability repository from
disk.
Reviewed By: artempyanykh
Differential Revision: D18225473
fbshipit-source-id: 06a2dc97e
Summary:
Steal a page from RacerD (and improve interface of) on using certain calls to assert
execution on a particular thread. Reduces FPs and FNs too.
Reviewed By: dulmarod
Differential Revision: D18199843
fbshipit-source-id: 5bdff0dfe
Summary:
The zero cost of node does not make sense especially when the abstract memory is non-bottom. This
resulted in unreasonable zero cost results sometimes, e.g. when the checker could not find
appropriate control varaibles having interval values of iteration. This diff fixes this, so sets
the minimum basic cost as 1, if the abstract memory at the node is non-bottom.
Reviewed By: ezgicicek
Differential Revision: D18199291
fbshipit-source-id: b215d10e5
Summary:
When reporting null dereference it is useful to know where the null came
from.
Reviewed By: skcho
Differential Revision: D18206459
fbshipit-source-id: 0c8e6781b
Summary:
This simplifies the code overall. It also makes accessing the action of
a "trace" (which is now stored alongside it instead of deep inside it)
constant time instead of linear in the number of nested calls.
Reviewed By: skcho
Differential Revision: D18206460
fbshipit-source-id: 9546ff36f
Summary:
This adds a more interesting value domain to pulse: concrete intervals.
There are still two main limitations:
1. arithmetic operations are all over-approximated: any assignment involving arithmetic operations is replaced by non-determinism
2. abstract values that are discovered to be equal are not merged into one
Reviewed By: skcho
Differential Revision: D18058972
fbshipit-source-id: 0492a590f
Summary:
This does several things because it was hard to split it more:
1. Split most of the arithmetic reasoning to PulseArithmetic.ml. This
doesn't need to be reviewed thoroughly because an upcoming diff
changes the domain from just `EqualTo of Const.t` to an interval domain!
2. When going through a prune node intra-procedurally, abduce arithmetic
facts to the pre (instead of just propagating them). This is the "assume
as assert" trick used by biabduction 1.0 too and allows to propagate
arithmetic constraints to callers.
3. Use 2 when applying summaries by pruning specs whose preconditions
have un-satisfiable arithmetic constraints.
This changes one of the tests! Pulse now does a bit more work to find
the false positive, as can be seen in the longer trace.
Reviewed By: skcho
Differential Revision: D18117160
fbshipit-source-id: af3b2c8c0
Summary:
Instead of checking that each address in the pre that must be valid is
not invalid in the caller (and error out if it turns out it is invalid)
as we discover them, save these checks for after we are sure that the
precondition can be applied. It is in fact a bug that we can report an
error when trying to apply a precondition that is actually not
satisfiable in the current state for other reasons than lifetime issues.
We still want to skip calls in case of weird issues like mismatch in
number of formals vs actuals.
This will have more obvious effects later when we also check that
arithmetic facts in preconditions are satisfied at the call site: if a
pre mandates "x=1" and "y must be valid" and we have "x=0" and "y
invalid" then we shouldn't report an error.
Reviewed By: skcho
Differential Revision: D18115229
fbshipit-source-id: ad4ce72ff
Summary:
If a precondition cannot be applied, it means that this program path
somehow doesn't make sense for the caller and so should be pruned. Right
now we just treat this as skipping over the call instead.
This will become more important when specs start mentioning arithmetic
facts that must be satisfied at the call site. As it is we will only
stop if we discover aliasing in the pre not present at the call site or
vice versa.
Reviewed By: dulmarod
Differential Revision: D18115230
fbshipit-source-id: 4f1c7a583
Summary: The way `<=` is used in `AbstractDomain` prevents infix use and forces bracketing it everywhere. Replace with simple `leq`.
Reviewed By: jvillard
Differential Revision: D18201854
fbshipit-source-id: 8175224e4
Summary: Abstract state tracks stuff that is not needed for summaries, wasting space/time.
Reviewed By: jvillard
Differential Revision: D18171499
fbshipit-source-id: 25483ced9
Summary:
Primitive types are not annotated. Because of that, we used to implicitly derive
`DeclaredNonnull` type for them. This worked fine, but this leads to errors in Strict mode, which does
not believe DeclaredNonnull type.
Now lets offifically teach nullsafe that primitive types are
non-nullable.
Reviewed By: jvillard
Differential Revision: D18114623
fbshipit-source-id: 227217931
Summary:
This diff adds SafeInvertedMap, which is similar to InvertedMap but it
guarantees that there is no top elements in the tree of the map.
Reviewed By: ezgicicek
Differential Revision: D18062174
fbshipit-source-id: 2fbc51f31
Summary: `Str.regexp_string` should be used to find a method name instead of `Str.regexp`.
Reviewed By: ezgicicek
Differential Revision: D18136598
fbshipit-source-id: c4b56dd64
Summary:
The reporting phase would go through the critical pairs in the summary roughly three times, once for each major type of warning (starvation, lockless violation, deadlock). This is wasteful, and also led to some code duplication. Fix.
Also, use the more efficient annotation matcher in `ConcurrencyModels` and move some model matchers to `StarvationModels`.
Reviewed By: artempyanykh
Differential Revision: D18118149
fbshipit-source-id: ff4dc3d07
Summary: It is now possible to push the thread status into each critical pair. This leads to higher precision, because when code branches on whether it is on the UI thread, the final abstract state of the procedure will be `AnyThread`, but pairs created in the UI thread branch should know that their status is `UIThread`, not `AnyThread`.
Reviewed By: jvillard
Differential Revision: D18114273
fbshipit-source-id: cbb99b46f
Summary: This will be more useful later when adding another one.
Reviewed By: ezgicicek, jberdine
Differential Revision: D18115231
fbshipit-source-id: a0a01901a
Summary:
The business of translating `Top/True/False` to `true/false` can be
hidden more.
Reviewed By: skcho
Differential Revision: D18115228
fbshipit-source-id: 071fcbddf
Summary:
Warning 33 (unused open) is enabled but the module open is not really
unused, it's just also opened at the top of the file...
Reviewed By: skcho
Differential Revision: D18114385
fbshipit-source-id: 2a8f9512a
Summary:
This diff avoids making top values on unknown non-static function,
such as abstract function, calls. This is necessary because the
generated top values ruin the precision of the cost checker.
Reviewed By: ezgicicek
Differential Revision: D17418611
fbshipit-source-id: aeb759bdd
Summary:
The wrong function was used when we tried to see if the class is
annotated with NullsafeStrict. This made it work only for non-static
methods.
Now we use the proper way.
Reviewed By: ngorogiannis
Differential Revision: D18113848
fbshipit-source-id: 02b7555be
Summary:
Previously, we considered a function which modifies its parameters to be impure even though it might not be modifying the underlying value. This resulted in FPs like the following program in Java:
```
void fresh_pure(int[] a) {
a = new int[1];
}
```
Similarly, in C++, we considered the following program as impure because it was writing to `s`:
```
Simple* reassign_pure(Simple* s) {
s = new Simple{2};
return s;
}
```
This diff fixes that issue by starting the check for address equivalnce in pre-post not directly from the addresses of the stack variables, but from the addresses pointed to by these stack variables. That means, we only consider things to be impure if the actual values pointed by the parameters change.
Reviewed By: skcho
Differential Revision: D18113846
fbshipit-source-id: 3d7c712f3
Summary: As suggested by artempyanykh, this will make it bit more clear
Reviewed By: artempyanykh
Differential Revision: D18037204
fbshipit-source-id: 65cb96815
Summary: We stop tracking at builder boundaries. Let's tract create methods as well so that trace is more informative.
Reviewed By: skcho
Differential Revision: D18038637
fbshipit-source-id: a99b6431f
Summary: Adding support to matching block names. We match mangled block names. We also needed to extend the function for extracting the range for each method, to also traverse the stmts to be able to find the block declarations.
Reviewed By: skcho
Differential Revision: D17956931
fbshipit-source-id: 707908812
Summary:
This is the first take on strict mode semantics.
The main invariant of strict mode is the following:
If the function passes `NullsafeStrict` check and its return value is
NOT annotated as Nullable, then the function does not indeed return
nulls, subject to unsoundness issues (which should either be fixed, or
should rarely happen in practice).
This invariant helps the caller in two ways:
1. Dangerous usages of non strict functions are visible, so the caller is enforced to check them (via assertions or conditions), or strictify them.
2. When the function is strict, the caller does not need to worry about
being defensive.
Biggest known issues so far:
1. Condition redundant and over-annotated warnings don't fully
respect strict mode, and this leads to stupid false positives. (There is
so much more stupid false positives in condition redundant anyway, so
not particularly a big deal for now).
2. Error reporting is not specific to mode. (E.g. we don't distinct real nullables and non-trusted non-nulls, which can be misleading). To be
improved as a follow up.
Reviewed By: artempyanykh
Differential Revision: D17978166
fbshipit-source-id: d6146ad71
Summary:
This is an intermediate nullability type powering future Strict mode.
See the next diff.
Reviewed By: artempyanykh
Differential Revision: D17977909
fbshipit-source-id: 2d5ab66d4
Summary:
Currently, we have NullsafeRules.ml responsible for detecting the
violation fact. All other logic: what should be the error type,
severity, and error message, is in TypeErr.ml.
In this diff, we move logic from NullsafeRules.ml and TypeErr.ml to
dedicated modules like AssignmentRule.ml etc.
Each such module is responsible for:
- detecting the violation fact (this is moved from NullsafeRules.ml)
- rendering the violation error (this is moved from TypeErr.ml).
This approach makes sense for two reasons:
1. The violation fact and the way we show error are logically related to
each other.
2. In future diffs, we will support more features guiding rule behavior,
such as a) decision whether to hide or show the error depending on type
information and mode; b) the way we render error depending on type
information and role.
Having dedicated modules incapsulating knowledge about rules is a natural way to support 2.
Reviewed By: artempyanykh
Differential Revision: D17977891
fbshipit-source-id: a53d916d3
Summary:
1. For each Nullsafe Rule, lets have a dedicated IssueType.
2. Split error reporting to three subfunctions: description, field type,
and infer issue.
This allows to introduce additional capabilities in a consolidated
manner. Example of such capabilities is should we hide or show an error,
what should be error severity depending on strictness mode, and how
exactly the error should be reported depending on how exactly
nullability is violated.
Reviewed By: artempyanykh
Differential Revision: D17977887
fbshipit-source-id: 860d67d2f
Summary:
This function can return `None` if the result is equal to the first
argument of join (why first?). It is unclear if it was an optimization
attempt of over-complicated logic.
Reviewed By: artempyanykh
Differential Revision: D17876561
fbshipit-source-id: 9628fb86e
Summary:
The goal of this logic is unclear:
1/ See the comments
2/ I can not see the scenario where classes and proper types can be
joined in a legit Java program
3/ Even it if was the case, I don't see how this heuristic is justified.
So I assume it is not.
Reviewed By: artempyanykh
Differential Revision: D17876568
fbshipit-source-id: c9c6cd604
Summary:
It is unclear what is the purpose of doing so, and it adds complexity to
codebase.
1/ The semantics of this is not clear, it more or less corresponds to
"where are all original locations that contributed to the type
calculation", but many branches in CFG have nothing to do with
nullability; also it was used not always consistently.
2/ The only place where this was used is logs, so this is no-ops. It is
unclear how seeing all locations can help debugging, given 3/ - see
below
3/ We have the right place to store such informatin, namely TypeOrigin,
where we store locations associated with types where we merge them in
CFG. Currently, we store only "winner" - the most relevant locations
that contributed to nullability in the most informative way. We show
this to the user when we report an error.
4/ If we want to support more things (e.g. show something more to the user), TypeOrigin
seems to be the right place. Or, alternatively, we might still merge
locations in `range`, but this will be better to organize in a tree form
instead of flat list that is not really informative and helpful. It is
all speculative though since need to support things like that seems
unclear.
Reviewed By: artempyanykh
Differential Revision: D17857198
fbshipit-source-id: 6cf6e48a2
Summary:
That module's interface was repeated twice to avoid exposing its
internals to PulseDomain itself. It's also quite long so it makes sense
to move it to its own file.
Reviewed By: ezgicicek
Differential Revision: D17977209
fbshipit-source-id: 56a2dac24
Summary:
Another poorman's library, this time about Pulse Domains. Also renames
`PulseDomain` to `PulseBaseDomain`.
Reviewed By: ezgicicek
Differential Revision: D17955287
fbshipit-source-id: 9c947cf98
Summary:
The name had rotten: it should be `AddrHistPair`. There is little value
of exposing the type of the pair `AbstractValue.t * ValueHistory.t`,
just inline its definition everywhere.
Reviewed By: ezgicicek
Differential Revision: D17955283
fbshipit-source-id: d145251e0
Summary:
See explanations in D17955104.
This renames `AbstractAddress` to `AbstractValue` since they are not
necessarily addresses.
Reviewed By: ezgicicek
Differential Revision: D17955290
fbshipit-source-id: 8bb4c61f2
Summary:
See explanations in D17955104. I put Attributes inside PulseAttribute
instead of creating a new file to avoid exposing more internals about
ranks.
Reviewed By: ezgicicek
Differential Revision: D17955284
fbshipit-source-id: a8719a58f
Summary:
Problem: PulseDomain.ml is pretty big, and contains lots of small
modules. The Infer build being a bit monolithic at the moment, it is
hard to split all these small modules off without creating some
confusion about which abstraction barries lay where. For instance, it's
fine to use `PulseDomain.ValueHistory` anywhere, but using `PulseDomain`
itself is sometimes bad when one should use `PulseAbductiveDomain`
instead.
Proposal: a poorman's library mechanism based on module aliasing. This
stack of diffs creates new Pulse* modules for all these small, safe to
use modules, together with `PulseBasicInterface.ml`, which aliases these
modules to remove the `Pulse` prefix. At the end of the stack, it will
contain:
```
module AbstractValue = PulseAbstractValue
module Attribute = PulseAttribute
module Attributes = PulseAttribute.Attributes
module CallEvent = PulseCallEvent
module Diagnostic = PulseDiagnostic
module Invalidation = PulseInvalidation
module Trace = PulseTrace
module ValueHistory = PulseValueHistory
```
This "interface" module can be opened in other pulse modules freely.
Reviewed By: ezgicicek
Differential Revision: D17955104
fbshipit-source-id: 13d3aa2b5
Summary:
Domain for thread-type. The main goals are
- Track code paths that are explicitly on UI thread (via annotations, or assertions).
- Maintain UI-thread-ness through the call stack (if a callee is on UI thread then the
trace any call site must be on the UI thread too).
- If we are not on the UI thread we assume we are on a background thread.
- Traces with "UI-thread" status cannot interleave but all other combinations can.
- We do not track other annotations (eg WorkerThread or AnyThread) as they can be
erroneously applied -- other checkers should catch those errors (annotation reachability).
- Top is AnyThread, and is used as the initial state for analysis.
Interestingly, by choosing the right strategy for choosing initial state and applying callee summaries gets rid of some false negatives in the tests even though we have not introduced any path sensitivity yet.
Reviewed By: jvillard
Differential Revision: D17929390
fbshipit-source-id: d72871034
Summary: For Objective-C methods we match the mangled names (the field is name in the profiler samples).
Reviewed By: skcho
Differential Revision: D17952552
fbshipit-source-id: 308d415f6
Summary:
bigmacro_bender
There are 3 ways pulse tracks history. This is at least one too many. So
far, we have:
1. "histories": a humble list of "events" like "assigned here", "returned from call", ...
2. "interproc actions": a structured nesting of calls with a final "action", eg "f calls g calls h which does blah"
3. "traces", which combine one history with one interproc action
This diff gets rid of interproc actions and makes histories include
"nested" callee histories too. This allows pulse to track and display
how a value got assigned across function calls.
Traces are now more powerful and interleave histories and interproc
actions. This allows pulse to track how a value is fed into an action,
for instance performed in callee, which itself creates some more
(potentially now interprocedural) history before going to the next step
of the action (either another call or the action itself).
This gives much better traces, and some examples are added to showcase
this.
There are a lot of changes when applying summaries to keep track of
histories more accurately than was done before, but also a few
simplifications that give additional evidence that this is the right
concept.
Reviewed By: skcho
Differential Revision: D17908942
fbshipit-source-id: 3b62eaf78
Summary:
Java method annotations are ambiguous in that there is no difference between
annotating the return value of a method, and annotating the method itself.
The disambiguation is done entirely based on the meaning of the annotation.
Here, while `UiThread`/`MainThread` are genuine method/class annotations
and not return annotations, the reverse is true for `ForUiThread`/`ForNonUiThread`.
This means that these latter annotations do not determine the thread status of
the method they are attached to.
Here we fix that misunderstanding.
Reviewed By: jvillard
Differential Revision: D17960994
fbshipit-source-id: 5aecfb124
Summary:
- Adds ATD file to parse the clang profiler samples
- Procnames don't help us here because we want to use mangled names, not the version of names that Infer needs, so passing in the RangeMap also ClangProc that just include the names and mangled names.
- First matching of c functions to have something in place to add a test, matching further method kinds to be done in next diff.
Reviewed By: skcho
Differential Revision: D17877071
fbshipit-source-id: b31d651a7
Summary:
I dunno, seemed wrong before. About to introduce another attribute with
similar arguments so making them consistent in advance.
Reviewed By: skcho
Differential Revision: D17930349
fbshipit-source-id: 944b58bac
Summary:
- add the variable being declared so we can report it back in the trace in addition to its location
- distinguish between local vars and formals
Reviewed By: skcho
Differential Revision: D17930348
fbshipit-source-id: a5b863e64
Summary:
- Putting test determinator in own directory
- Putting Java procname creation stuff in its own module
Reviewed By: skcho
Differential Revision: D17929885
fbshipit-source-id: 4f2578566
Summary:
Eventually thread status will be stored inside every critical pair so as to allow path sensitivity. That means that the status can no longer be a whole trace, as this will quickly become intractable, because each domain element would have to maintain its own trace as well as its own thread-status trace.
This is not great, as we lose information here, but I don't see any other way around it that is not super complicated/costly (sharing will be limited when moving from callee to caller).
Other diffs up the stack will clean up infrastructure no longer used meaningfully (ie models and domains).
Reviewed By: mityal
Differential Revision: D17908908
fbshipit-source-id: 3bf353e33
Summary:
This diff adopts an array length evaluation function that is conservative. It is useful when our
domain cannot express length result precisely.
For example, suppose there is an array pointer `arr_locs` that may point to two arrays `a` and `b`,
and their lengths are `a.length` and `b.length` (symbols), respectively. Using the usual
evaluation, our current domain cannot express `a.length join b.length` (join of two symbolic
values), so it returns top.
In this case, we can use the conservative function intead. It evaluates the length as `[0,
a.length.ub + b.length.ub]`, since we know every array length is positive. The result is not
precise, but better than top.
Reviewed By: ezgicicek
Differential Revision: D17908859
fbshipit-source-id: 7c0b1591b
Summary:
This will avoid printing stuff like
"0$?%__sil_tmpSIL_materialize_temp__n$2 declared" to the poor
unsuspecting user. The non-verbose stuff is used only by pulse so far as
far I can tell so hopefully this doesn't break anything.
Reviewed By: ezgicicek
Differential Revision: D17908943
fbshipit-source-id: 8ef4f1a8f
Summary:
Instead of a string argument named `~str` pass `Formal | Global` and let
`add_to_errlog` figure out how to print it.
Reviewed By: ezgicicek
Differential Revision: D17907657
fbshipit-source-id: ed09aab72
Summary:
When we make the decision to go into a branch "v = N" where some
abstract value is compared to a constant, remember the corresponding
equality. This allows to prune simple infeasible paths
intra-procedurally.
Further work is needed to make this useful interprocedurally, for
instance either or both of these ideas could be explored:
- abduce v=N in the precondition and do not apply summaries when the
equalities in the pre are not satisfied
- prune post-conditions that lead to unsat states where a value has to
be equal to several different constants
Reviewed By: skcho
Differential Revision: D17906166
fbshipit-source-id: 5cc84abc2
Summary:
When we know "x = 3" and we have a condition "x != 3" we know we can
prune the corresponding path.
Reviewed By: skcho
Differential Revision: D17665472
fbshipit-source-id: 988958ea6
Summary:
First step in having a value domain: record concrete values. We record
them as equalities to abstract values using a new attribute `Constant`.
In a way, attributes are already our "pure" part in the "formulas" that
are pulse abstract domains, so this is reminiscent of existing
separation logic implementations. Trying to add values directly in the
"heap" part proved very cumbersome whereas this approach is very simple,
allowing us to ignore values most of the time except when we actually
care.
Reviewed By: skcho
Differential Revision: D17665473
fbshipit-source-id: b8033ad9c
Summary:
It's annoying to see `sexp_list` everywhere instead of `list`, eg in
merlin.
See also D17907938.
Reviewed By: ngorogiannis
Differential Revision: D17927994
fbshipit-source-id: 84599e8bc
Summary:
Let's add basic Java support to impurity checker. Since impurity checker relies on pulse, we need to add Java with Pulse callback as well. Pulse doesn't officially support Java yet, but we can enable it for impurity checker for now.
Many Java primitives/operations are not yet modeled (such as creation of new objects, support for collections etc.). Still, it is good to run impurity checker on the existing tests of the purity checker. Also, it is nice to see that we can identify most of the impure functions correctly in the purity dir. There are a lot of FNs though.
Reviewed By: skcho
Differential Revision: D17906237
fbshipit-source-id: 15308d285
Summary:
By some unfortunate logic, OCaml often decides to use
`sexp_list`/`sexp_option` instead of just `list`/`option`. Sometimes
these get copy/pasted in interface files.
It would be good to tell OCaml not to do that in the first place but in
the meantime: this diff.
Reviewed By: ngorogiannis
Differential Revision: D17907938
fbshipit-source-id: 7546834a2
Summary:
This diff introduces inequality for the iterator alias target, as we
did for the size target before.
Reviewed By: ezgicicek
Differential Revision: D17879208
fbshipit-source-id: cc2f6a723
Summary: If we have no pulse summary (most likely caused by pulse finding a legit issue with the code), let's consider the function as impure.
Reviewed By: jvillard
Differential Revision: D17906016
fbshipit-source-id: 671d3e0ba
Summary:
Describe what the --report-*-* options actually do instead of their
outdated documentation from the time where this was
`--checkers-blacklist-regex`, `--infer-blacklist-regex` and the like.
Reviewed By: dulmarod, mityal
Differential Revision: D17906015
fbshipit-source-id: 204349e9e
Summary:
This diff revises the semantics of hasNext model to add the lengths of
arrays, rather than join them to top.
Reviewed By: ezgicicek
Differential Revision: D17882388
fbshipit-source-id: f5edaedb3
Summary: Not used feature and with no obvious roadmap anymore.
Reviewed By: jberdine, jvillard
Differential Revision: D17855860
fbshipit-source-id: fd75b9d62
Summary: This makes more explicit what we are talking about here. Also, in extending test determinator to clang, the name is incorrect, but the set is generic procnames which is fine to use for clang, just the name is wrong.
Reviewed By: ngorogiannis
Differential Revision: D17855338
fbshipit-source-id: e93bae083
Summary: This diff models the cost of `ImmutableSet.chooseTableSize(setSize)` as `O(log setSize)` and `construct(n, ...)` as `O(n)`.
Reviewed By: ezgicicek
Differential Revision: D17829850
fbshipit-source-id: 0ee318cc3
Summary:
This diff fixes a data race in ProcessPool: out channel flush was
outside of the critical section.
Reviewed By: ezgicicek
Differential Revision: D17853991
fbshipit-source-id: ac0fd2a69
Summary:
[androidx.collection.SimpleArrayMap](https://developer.android.com/reference/androidx/collection/SimpleArrayMap.html) also has `keySet` and `entrySet` methods which make them eligible for inefficient keyset checker. Let's add it.
Title
Reviewed By: skcho
Differential Revision: D17831594
fbshipit-source-id: 32e831e18
Summary:
Starvation analysis keeps a trace documenting why a method is seen as on the UI thread (many reasons possible, often confusing). This was a call-stack plus string, for keeping the explanation of why the last callee is on the UI thread. This is bad, because it takes too much memory/storage (each string is custom-made to the classes/method involved), and is effectively untyped.
Switch to a proper type for explaining this, so the cost is just a few pointers plus shared procnames/types, and then convert to string only when reporting. This will also allow to push the UI trace inside each element of the starvation domain, so as to allow path sensitivity etc, without blowing up summary size.
Reviewed By: ezgicicek, artempyanykh
Differential Revision: D17810007
fbshipit-source-id: cdd743975
Summary:
This diff proceeds work for consolidating nullsafe logic and making it
type-agnostic
Reviewed By: artempyanykh
Differential Revision: D17808485
fbshipit-source-id: 85356c625
Summary:
The current usage has several issues reducing code maintainability and
readability:
1. Null_field_access was misleading: it was used for checking accesing
to arrays as well!
2. But actually, when checking access to array via `length`, we sometimes
pretended it is a field access (hence very tricky code in rendering the
error).
3. "Call receiver consistency" is unclear name, was not obvious that it is all about
calling a method in an object.
Let's also consolidate code.
Reviewed By: artempyanykh
Differential Revision: D17789618
fbshipit-source-id: 9b0f58c9c
Summary:
This proceeds work on abstracting out operations requiring raw
nullability operations from infer core code. This will simplify coming
introducing of intermediate nullability levels
Reviewed By: artempyanykh
Differential Revision: D17789612
fbshipit-source-id: 9a2bea2ed
Summary:
In previous refactoring stages, we operated on AnnotatedNullability
(nullability of a field or method signature together with its origin),
and InferredNullability (nullability of a value in typestate together
with its origin).
Now it is time to extract common Nullability as a type system concept,
together with `<:` and `join` functionality. This was sketched in
`NullsafeRules`, so this diff consolidates this as well.
In follow up diffs, we will reduce/get rid of direct usages of things like
`InferredNullability.is_nullable`. This will simplify introducing
intermediate Nullability types.
Reviewed By: artempyanykh
Differential Revision: D17789599
fbshipit-source-id: f1b9d2dd0
Summary: StarvationModels depended on StarvationDomain which is the wrong way around, and forbids using *Models from *Domain.
Reviewed By: mityal
Differential Revision: D17809431
fbshipit-source-id: 5aa369e7c
Summary: The type hierarchy was traversed multiple times when searching for annotations: once for methods/overrides annotated and once for superclasses. This can be done in one pass.
Reviewed By: dulmarod
Differential Revision: D17787172
fbshipit-source-id: 248dd4c27
Summary: Before, we didn't track litho framework callees on client code which was wrong. Now, we replace this with the following: If the callee is `build()` itself or doesn't contain a `build()` in its summary, then we want to track it in the domain. The former makes sense since we always want to track `build()` methods. The latter also makes sense since such a method could be a setter for a prop (as in the case of `prop1` in `buildPropLithoOK` which we were missing before due to the imprecise heuristic that prevented picking up callees in litho).
Reviewed By: ngorogiannis
Differential Revision: D17810704
fbshipit-source-id: 87d88e921
Summary:
Improve the trace by incorporating the callees and their locations in the call chain (i.e. chain of methods starting from `build()` call)
- extend the domain to contain the callee location
- replace the test results with the new traces
This makes our job much easier to debug FPs in a big codebase.
Reviewed By: skcho
Differential Revision: D17788996
fbshipit-source-id: 31938b5fe
Summary: `litho` checker contained two checkers: required-props and graphQL field accesses. Although they use the same domain, their reporting conditions and analysis details are different. However, they were bundled into the same analysis by adding disjunctions to `exec_instr` to handle both cases. Let's separate them into two different checkers, keeping a modular transfer function and analyzer that is reused by these two checkers.
Reviewed By: skcho
Differential Revision: D17788834
fbshipit-source-id: 47d77063b
Summary: The regexp would match `ge` and is also unnecessary and was compiled on every call. Save some cpu cycles while fixing it.
Reviewed By: jvillard
Differential Revision: D17789586
fbshipit-source-id: a3f6612c6
Summary:
At some point it was thought that we can assume that any annotation starting with "On" means the method is on the UI thread.
That's too imprecise and has led to false positives and negatives. Restrict to a well-known safe set.
Reviewed By: ezgicicek
Differential Revision: D17769376
fbshipit-source-id: 0f8fee059
Summary:
Currently, lock state is a map from locks to stacks of lock acquisitions.
Since we now have the separate acquisitions component, we no longer need to
remember the stack of acquisitions for a lock. Instead we only need a lock count,
thus reducing the memory footprint.
At the same time, change acquire and release so that they make one tree operation per
component (map/acquisition) as opposed to two (search/update) operations.
Reviewed By: ezgicicek
Differential Revision: D17736727
fbshipit-source-id: 7579eb61e
Summary:
If we acquire n nested locks we end up with n critical pairs, and for each pair the held-locks component goes up linearly, thus total space/time is O(n^2).
If the sets of held-locks are constructed with maximal sharing (intuitively, if they derive from the same set by additions) then the total space/time is O(n logn).
To do this, we must avoid constructing a new set every time we ask what the currently held locks are. Here we maintain (care is needed in the presence of recursive locks) that set across lock acquisitons and removals.
Reviewed By: skcho
Differential Revision: D17736252
fbshipit-source-id: 0f9b292c4
Summary:
This diff tries to narrowing the fixpoint of outermost loops, so that over-approximated widened values do not flow to the following code.
Problem: There are two phases for finding a fixpoint, widening and narrowing. First, it finds a fixpoint with widening, in function level. After that, it finds a fixpoint with narrowing. A problem is that sometimes an overly-approximated, imprecise, values by widening are flowed to the following loops. They are hard to narrow in the narrowing phase because there is a cycle preventing it.
To mitigate the problem, it tries to do narrowing, in loop level, right after it found a fixpoint of a loop. Thus, it narrows before the widened values are flowed to the following loops. In order to guarantee the termination of the analysis, this eager narrowing is applied only to the outermost loops.
Reviewed By: ezgicicek
Differential Revision: D17740265
fbshipit-source-id: e2d454036
Summary:
This diff extends the alias domain to analyze loop with list comprehensions form in Java precisely.
```
list2 = new List();
for (Element e : list1) {
list2.add(e);
}
```
1. `IteratorOffset` is a relation between a iterator offset and a length of another array. For example, in the above example, after n-times of iterations, the offset of the iterator (if it exists) and the length of `list2` are the same as `n`.
2. `IteratorHasNext` is a relation between iterator and its `hasNext` result.
3. At the conditional nodes, it prunes the alias list length of `list2` by that of `list1`.
* if `hasNext(list1's iterator)` is true, `list2`'s length is pruned by `< list1's length`
* if `hasNext(list1's iterator)` is false, `list2`'s length is pruned by `= list1's length`
Reviewed By: ezgicicek
Differential Revision: D17667128
fbshipit-source-id: 41fb23a45
Summary: This diff fixes a potential race in writing to pipe channel.
Reviewed By: ngorogiannis
Differential Revision: D17710123
fbshipit-source-id: 477492750