Summary:
This diff avoids that null-retuned path's abstract value ruins that of non-null-returned path.
What this diff does is: when joining two abstract states, one is null-return-path and the other is
non-null-return-path (`return obj;`), it keeps the method calls of `obj` from the
non-null-return-path.
While this design is unsound, I think it should work in practice.
Reviewed By: ezgicicek
Differential Revision: D19348313
fbshipit-source-id: cf5d0f3ff
Summary: The type-name definition for Java can be potentially improved (eg increase sharing, or comparison speed, much like `QualifiedCppName`) by switching away from `Mangled.t` which is essentially a string. First step is to abstract the type.
Reviewed By: jberdine
Differential Revision: D19087508
fbshipit-source-id: 91a81f63b
Summary: This diff use actuall call path in the cost results instead of `class name + method name`.
Reviewed By: ngorogiannis
Differential Revision: D19194969
fbshipit-source-id: b72018586
Summary:
In the previous code, it removed non-build-called method calls. For example, it was like
```
{non-build-called: {prop1, prop3}}
{build-called: {}}
b.build();
{non-build-called: {}}
{build-called: {prop1, prop3}}
```
However, this behavior introduced a false positive when there is multiple builders that point to the
same abstract object and `build` is called one by one.
This diff changes the semantics to keep the method calls of non-build-called at `build` calls.
Reviewed By: ezgicicek
Differential Revision: D19144525
fbshipit-source-id: e2ace127f
Summary: It is confusing to report missing props at the the beginning of the method (especially when there are many components created or when the method has many lines). Let's report them at create methods to better contextualize. Also, make the missing prop bold.
Reviewed By: skcho
Differential Revision: D19141135
fbshipit-source-id: a23d2e7c9
Summary:
We do not have the create method in the trace which makes it difficult to understand
- inter-procedural issues where create and prop setting are in different methods
- there are multiple create-build chains in a method
Let's add the create to the beginning of the trace. Moreover, let's simplify the prop printing to make traces easier to understand.
Reviewed By: ngorogiannis
Differential Revision: D19020213
fbshipit-source-id: 7f8a5d4ec
Summary: The new domain is much better than the old one. Let's kill the old one (along with old litho tests) and simplify things.
Reviewed By: skcho
Differential Revision: D18959627
fbshipit-source-id: df77ae20e
Summary:
- Remove `to_flat_string` as there is `get_field_name` that unambiguously does the same thing.
- Make `pp` print only the field in all languages.
- Fix `to_full_string` so that it has unified behaviour across java/clang and so that it doesn't print `class Foo.x`, but rather `Foo.x`.
Reviewed By: ezgicicek
Differential Revision: D18963033
fbshipit-source-id: e2c803c7d
Summary:
Remove Clang and Java submodules of Typ.Fieldname. They are unnecessary and they reflect a fake dichotomy: there is only one fieldname type. To distinguish between fields of Java classes and other C constructs, there is a helper function provided, but the idea is simple: obtain the class type the field belongs to, and check if it's a Java class.
This diff still preserves behaviour, but removes as many functions as possible from the interface, to leave a small surface.
Reviewed By: mityal
Differential Revision: D18962423
fbshipit-source-id: ffe6933ee
Summary:
In order to handle the example added:
changed domain of `MethodCalled`
from `CreatedLocation -> (IsBuildCalled X IsChecked X Set(MethodCall))`
to `(CreatedLocation X IsBuildCalled) -> (IsChecked X Set(MethodCall))`
This avoids joining of two method calls where one is build-called and the other is not, e.g.,
```
if(b) {
o.build();
} else {
// no build call
}
```
changed domain of `NewDomain`
from `Created X MethodCalled`
to `(Created X MethodCalled) X (Created X MethodCalled)`
One is for no returned memory and the other is returned memory. This keeps precision some join
points of branches, e.g.,
```
if(b) {
return;
} else {
// no return
}
```
Reviewed By: ezgicicek
Differential Revision: D18909768
fbshipit-source-id: c39d1a1ef
Summary: It is not used anywhere and there are no plans to revive it. Kill it!
Reviewed By: skcho
Differential Revision: D18934719
fbshipit-source-id: b9b069b96
Summary:
The `Typ.FIeldname` module has many issues. Among those:
- It has 5 different string/printing functions and most of them do radically different things in Java and in Clang.
- There is no type safety: creating a Clang field and calling a Java function on it will lead to a crash (`rindex_exn` etc, there are usually no dots in Clang fields).
- It uses a single string for Java fields, containing the package, the class and the field, e.g., `java.lang.Object.field`. This is wasteful, because
- there is no sharing of strings for packages/classes, and,
- string operations need to be performed every time we need the field or the class or the package alone.
This diff preserves the behaviour of the module's interface, so the API problems remain.
However, by using a saner representation for Java fields we can get small performance and large memory gains (the type environment in Java is much smaller, about 30-40%).
In addition, many functions on clang fields would previously do string manipulations (look for `.` and split on it) before returning the final field unchanged -- now they use the type of the field for that.
Reviewed By: jvillard
Differential Revision: D18908864
fbshipit-source-id: a72d847cc
Summary: The formals of callees will be used in the following diff.
Reviewed By: ezgicicek
Differential Revision: D18878044
fbshipit-source-id: 6c01af826
Summary:
This reverts commit 4fd6165d190bab32544f9f040b777565432c15b2.
We don't need to check for reporting each node anymore. It suffices to just check per function.
Reviewed By: skcho
Differential Revision: D18883833
fbshipit-source-id: 2591b3af3
Summary:
Making `MethodCalled` an inverted map from created location to method calls results in not being able to track a builder that is created in two different branches of a conditional with different types. Instead, we can make `MethodCalled` simply a map and also change `Created` to be a map from access paths to a set of created locations.
To deal with the case of setting a prop only in one branch, we need to ensure that whenever we call a create method, we add a binding to `MethodCalled` with an empty list of methods so that its intersection with a non-empty one is empty.
Reviewed By: skcho
Differential Revision: D18883097
fbshipit-source-id: b3464ca20
Summary: As long as the types match, it should be possible to call build on two components that are created at different locations.
Reviewed By: skcho
Differential Revision: D18881740
fbshipit-source-id: 356f9e168
Summary: The map from `CreatedLocation` to `MethodCalls` already takes care of the association from create methods to their set props. `MethodCall` comparison should be oblivious the the receiver, otherwise, we risk mistakenly considering two props set at different locations as different.
Reviewed By: skcho
Differential Revision: D18829388
fbshipit-source-id: b5a0d628d
Summary: This diff check and report on every nodes. Problem of the previous design is that it has to report alarms only with the abstract memory of the exit node. However, the new abstract value becomes imprecise at every join points on the path to the exit node, since it is using inverted map, i.e., under-approximation on collecting called methods. As a solution, this diff report on every nodes where `.build` is called with the abstract memory at that node.
Reviewed By: ezgicicek
Differential Revision: D18809449
fbshipit-source-id: 4fd6165d1
Summary: Current method call comparison is too strong. As exemplified with the new test, one can also set the required prop by calling a version which contains the suffixes. The domain should take care of such cases now.
Reviewed By: skcho
Differential Revision: D18808869
fbshipit-source-id: 9f7672e75
Summary: This enables us to quickly pick up the Component typ when checking which required props to check.
Reviewed By: skcho
Differential Revision: D18807925
fbshipit-source-id: 47e407394
Summary: This diff checks litho condition on the new abstract value. This is triggered with `--new-litho-domain`, but it is intra-procedural as of now.
Reviewed By: ezgicicek
Differential Revision: D18783203
fbshipit-source-id: 98570104e
Summary: This diff adds a map from an object to a set of methods that has been called.
Reviewed By: ezgicicek
Differential Revision: D18781461
fbshipit-source-id: 47c10adaa
Summary: This adds a map from access expressions to locations where specific object is created by `create` method call.
Reviewed By: ezgicicek
Differential Revision: D18779985
fbshipit-source-id: f327450cd
Summary: This diff stack is to revise overall domain and semantics of litho checker. Some of the following diffs will not have visible changes until we replace the current checker with the new one.
Reviewed By: ezgicicek
Differential Revision: D18779592
fbshipit-source-id: d210d9bd1
Summary:
This can happen for a number of reasons that are not errors (mostly the exit node being unreachable) so isn't actionable, and definitely not worth showing the user. It could be a debug message but I don't think that's even worth it. Other checkers don't warn in similar circumstances.
With OCaml 4.08 we started actually seeing these error messages (differences in flushing behaviour?) so their annoying nature was revealed.
Reviewed By: ngorogiannis
Differential Revision: D18808460
fbshipit-source-id: a47a1dcb4
Summary: Raises an error when weakSelf is used multiple times in a block. The issue is that there could be unexpected behaviour. Even if `weakSelf` is not nil in the first use, it could be nil in the following uses since the object that `weakSelf` points to could be freed anytime.
Reviewed By: skcho
Differential Revision: D18765493
fbshipit-source-id: 37fc652e3
Summary:
This diff enables parsing and auto-formatting documentation
comments (aka docstrings).
I have looked at this entire diff and manually made some changes to
improve the formatting. In some cases it looked like it would take too
much time, or benefit from someone more familiar with the code doing
it, and I instead disabled auto-formatting docstrings in those files.
Also, there are some source files where the docstrings are invalid,
and some where the structure detected by the parser appears not to
match what was intended. Auto-formatting has been disabled for these
files.
Reviewed By: ezgicicek
Differential Revision: D18755888
fbshipit-source-id: 68d72465d
Summary:
This diff fixes the model of substring.
Problem: The cost model of the substring function was to return `size of string - start index` as a
cost. However, sometimes this was a negative number, because of state abstractions on paths, array
elements, call contexts, etc, which caused an exception inadvertently.
This diff changes the model to return just `size of string`, when it cannot say `size of string` is
bigger than `start index`.
Reviewed By: ezgicicek
Differential Revision: D18707954
fbshipit-source-id: 63f27e461
Summary: Preperation diff to use `ProcnameDispatcher` for Pulse: it changes function arguments, i.e. `ProcnameDispatcher.Call.FuncArg`, to a record in order to track the value of arguments. To do that, it changes `ProcnameDispatcher.Call` into a functor so that we can parametrize over the type of the value without making changes upwards.
Reviewed By: jvillard
Differential Revision: D18590224
fbshipit-source-id: 6a13fbc1a
Summary: Another dead flag that one could mistakenly think is accurate.
Reviewed By: dulmarod
Differential Revision: D18573925
fbshipit-source-id: 129a9cff5
Summary: Adding a trace that includes when strongSelf was assigned to, and when it's used without a check.
Reviewed By: skcho
Differential Revision: D18506113
fbshipit-source-id: 778c4b086
Summary:
The variable strongSelf, because it is equal to a weak captured pointer, needs to be checked for null before being used, otherwise it could be null by the time the block is executed.
Added this check to the SelfInBlock checker, and removed it from biabduction.
We want to migrate all the objc checks from biabduction, so it will be easier to change and faster and more reliable. Moreover, this check is more general, it will flag any use of unchecked strongSelf, not just a dereference.
Reviewed By: skcho
Differential Revision: D18403849
fbshipit-source-id: a9cf5d80b
Summary: Due to the weakness of the analysis which can't detect side-effecting prop setting (e.g. as in `builder.prop1(..)`), we currently have many broken chains that do do have any `create` method in their prefixes. Let's not report on these broken chains.
Reviewed By: skcho
Differential Revision: D18503523
fbshipit-source-id: 7506e34b7
Summary:
We were not tracking static `create ` and `build` methods intra-procedurally in the analysis and we just looked up their summaries. This resulted in not having static `create` methods in the trace. Let's fix that by pushing static check into `satisfy_heuristic`.
This results in a few FPs, but such cases are not very common in the codebase.
Reviewed By: skcho
Differential Revision: D18451955
fbshipit-source-id: d4ccf46f5
Summary:
We consider Java collections to be like c++ std::vectors and add models for
- `Collections.get(..)`
- `__cast`
Reviewed By: skcho
Differential Revision: D18449607
fbshipit-source-id: 448206c84
Summary:
Let's introduce a set of new cost analysis issue types that are raised when the function is statically determined to run on the UI thread. For this, we rely on the existing `runs_on_ui_thread` check that is developed for RacerD. We also update the cost summary and `jsonbug.cost_item` to include whether a method is on the ui thread so that we don't repeatedly compute this at diff time for complexity increase issues.
Note that `*_UI_THREAD` cost issues are assumed to be more strict than `*_COLD_START` reports at the moment. Next, we can also consider adding a new issue type that combines both such as `*_UI_THREAD_AND_COLD_START` (i.e. for methods that are both on cold start and run on ui thread).
Reviewed By: ngorogiannis
Differential Revision: D18428408
fbshipit-source-id: f18805716
Summary:
Now that we have two similar functions, it becomes confusing, because `Pp.to_string` and `Pp.string_of_pp` can seem to do the same stuff, while in reality they do the opposite.
Well, it is still bit confusing, because the proper names would be
`Pp.pp_of_to_string` and `Pp.to_string_of_pp`, but I think this high
level order names are not necessary given that in most cases they will
be used as concrete functions.
I think `Pp.of_string` captures such usages better than `to_string` used to do: you need to pp stuff,
but you have a string (or, technically, a function that returns a string), so you pretty print OF that string, aren't you?
Reviewed By: jvillard
Differential Revision: D18245876
fbshipit-source-id: fd4b6ab68
Summary:
The zero cost of node does not make sense especially when the abstract memory is non-bottom. This
resulted in unreasonable zero cost results sometimes, e.g. when the checker could not find
appropriate control varaibles having interval values of iteration. This diff fixes this, so sets
the minimum basic cost as 1, if the abstract memory at the node is non-bottom.
Reviewed By: ezgicicek
Differential Revision: D18199291
fbshipit-source-id: b215d10e5
Summary:
This simplifies the code overall. It also makes accessing the action of
a "trace" (which is now stored alongside it instead of deep inside it)
constant time instead of linear in the number of nested calls.
Reviewed By: skcho
Differential Revision: D18206460
fbshipit-source-id: 9546ff36f
Summary: The way `<=` is used in `AbstractDomain` prevents infix use and forces bracketing it everywhere. Replace with simple `leq`.
Reviewed By: jvillard
Differential Revision: D18201854
fbshipit-source-id: 8175224e4
Summary:
Previously, we considered a function which modifies its parameters to be impure even though it might not be modifying the underlying value. This resulted in FPs like the following program in Java:
```
void fresh_pure(int[] a) {
a = new int[1];
}
```
Similarly, in C++, we considered the following program as impure because it was writing to `s`:
```
Simple* reassign_pure(Simple* s) {
s = new Simple{2};
return s;
}
```
This diff fixes that issue by starting the check for address equivalnce in pre-post not directly from the addresses of the stack variables, but from the addresses pointed to by these stack variables. That means, we only consider things to be impure if the actual values pointed by the parameters change.
Reviewed By: skcho
Differential Revision: D18113846
fbshipit-source-id: 3d7c712f3
Summary: We stop tracking at builder boundaries. Let's tract create methods as well so that trace is more informative.
Reviewed By: skcho
Differential Revision: D18038637
fbshipit-source-id: a99b6431f
Summary:
This is the first take on strict mode semantics.
The main invariant of strict mode is the following:
If the function passes `NullsafeStrict` check and its return value is
NOT annotated as Nullable, then the function does not indeed return
nulls, subject to unsoundness issues (which should either be fixed, or
should rarely happen in practice).
This invariant helps the caller in two ways:
1. Dangerous usages of non strict functions are visible, so the caller is enforced to check them (via assertions or conditions), or strictify them.
2. When the function is strict, the caller does not need to worry about
being defensive.
Biggest known issues so far:
1. Condition redundant and over-annotated warnings don't fully
respect strict mode, and this leads to stupid false positives. (There is
so much more stupid false positives in condition redundant anyway, so
not particularly a big deal for now).
2. Error reporting is not specific to mode. (E.g. we don't distinct real nullables and non-trusted non-nulls, which can be misleading). To be
improved as a follow up.
Reviewed By: artempyanykh
Differential Revision: D17978166
fbshipit-source-id: d6146ad71
Summary:
Another poorman's library, this time about Pulse Domains. Also renames
`PulseDomain` to `PulseBaseDomain`.
Reviewed By: ezgicicek
Differential Revision: D17955287
fbshipit-source-id: 9c947cf98
Summary:
See explanations in D17955104.
This renames `AbstractAddress` to `AbstractValue` since they are not
necessarily addresses.
Reviewed By: ezgicicek
Differential Revision: D17955290
fbshipit-source-id: 8bb4c61f2
Summary:
See explanations in D17955104. I put Attributes inside PulseAttribute
instead of creating a new file to avoid exposing more internals about
ranks.
Reviewed By: ezgicicek
Differential Revision: D17955284
fbshipit-source-id: a8719a58f
Summary:
bigmacro_bender
There are 3 ways pulse tracks history. This is at least one too many. So
far, we have:
1. "histories": a humble list of "events" like "assigned here", "returned from call", ...
2. "interproc actions": a structured nesting of calls with a final "action", eg "f calls g calls h which does blah"
3. "traces", which combine one history with one interproc action
This diff gets rid of interproc actions and makes histories include
"nested" callee histories too. This allows pulse to track and display
how a value got assigned across function calls.
Traces are now more powerful and interleave histories and interproc
actions. This allows pulse to track how a value is fed into an action,
for instance performed in callee, which itself creates some more
(potentially now interprocedural) history before going to the next step
of the action (either another call or the action itself).
This gives much better traces, and some examples are added to showcase
this.
There are a lot of changes when applying summaries to keep track of
histories more accurately than was done before, but also a few
simplifications that give additional evidence that this is the right
concept.
Reviewed By: skcho
Differential Revision: D17908942
fbshipit-source-id: 3b62eaf78
Summary:
Java method annotations are ambiguous in that there is no difference between
annotating the return value of a method, and annotating the method itself.
The disambiguation is done entirely based on the meaning of the annotation.
Here, while `UiThread`/`MainThread` are genuine method/class annotations
and not return annotations, the reverse is true for `ForUiThread`/`ForNonUiThread`.
This means that these latter annotations do not determine the thread status of
the method they are attached to.
Here we fix that misunderstanding.
Reviewed By: jvillard
Differential Revision: D17960994
fbshipit-source-id: 5aecfb124
Summary:
Instead of a string argument named `~str` pass `Formal | Global` and let
`add_to_errlog` figure out how to print it.
Reviewed By: ezgicicek
Differential Revision: D17907657
fbshipit-source-id: ed09aab72
Summary:
Let's add basic Java support to impurity checker. Since impurity checker relies on pulse, we need to add Java with Pulse callback as well. Pulse doesn't officially support Java yet, but we can enable it for impurity checker for now.
Many Java primitives/operations are not yet modeled (such as creation of new objects, support for collections etc.). Still, it is good to run impurity checker on the existing tests of the purity checker. Also, it is nice to see that we can identify most of the impure functions correctly in the purity dir. There are a lot of FNs though.
Reviewed By: skcho
Differential Revision: D17906237
fbshipit-source-id: 15308d285
Summary:
By some unfortunate logic, OCaml often decides to use
`sexp_list`/`sexp_option` instead of just `list`/`option`. Sometimes
these get copy/pasted in interface files.
It would be good to tell OCaml not to do that in the first place but in
the meantime: this diff.
Reviewed By: ngorogiannis
Differential Revision: D17907938
fbshipit-source-id: 7546834a2
Summary: If we have no pulse summary (most likely caused by pulse finding a legit issue with the code), let's consider the function as impure.
Reviewed By: jvillard
Differential Revision: D17906016
fbshipit-source-id: 671d3e0ba
Summary: This diff models the cost of `ImmutableSet.chooseTableSize(setSize)` as `O(log setSize)` and `construct(n, ...)` as `O(n)`.
Reviewed By: ezgicicek
Differential Revision: D17829850
fbshipit-source-id: 0ee318cc3
Summary:
[androidx.collection.SimpleArrayMap](https://developer.android.com/reference/androidx/collection/SimpleArrayMap.html) also has `keySet` and `entrySet` methods which make them eligible for inefficient keyset checker. Let's add it.
Title
Reviewed By: skcho
Differential Revision: D17831594
fbshipit-source-id: 32e831e18
Summary: Before, we didn't track litho framework callees on client code which was wrong. Now, we replace this with the following: If the callee is `build()` itself or doesn't contain a `build()` in its summary, then we want to track it in the domain. The former makes sense since we always want to track `build()` methods. The latter also makes sense since such a method could be a setter for a prop (as in the case of `prop1` in `buildPropLithoOK` which we were missing before due to the imprecise heuristic that prevented picking up callees in litho).
Reviewed By: ngorogiannis
Differential Revision: D17810704
fbshipit-source-id: 87d88e921
Summary:
Improve the trace by incorporating the callees and their locations in the call chain (i.e. chain of methods starting from `build()` call)
- extend the domain to contain the callee location
- replace the test results with the new traces
This makes our job much easier to debug FPs in a big codebase.
Reviewed By: skcho
Differential Revision: D17788996
fbshipit-source-id: 31938b5fe
Summary: `litho` checker contained two checkers: required-props and graphQL field accesses. Although they use the same domain, their reporting conditions and analysis details are different. However, they were bundled into the same analysis by adding disjunctions to `exec_instr` to handle both cases. Let's separate them into two different checkers, keeping a modular transfer function and analyzer that is reused by these two checkers.
Reviewed By: skcho
Differential Revision: D17788834
fbshipit-source-id: 47d77063b
Summary:
At some point it was thought that we can assume that any annotation starting with "On" means the method is on the UI thread.
That's too imprecise and has led to false positives and negatives. Restrict to a well-known safe set.
Reviewed By: ezgicicek
Differential Revision: D17769376
fbshipit-source-id: 0f8fee059
Summary:
The documentation and uses of filtering disagree. One typical usage is deduplication.
Split that where obvious, add comments where not obvious, and leave alone when obviously unrelated to deduplication.
Reviewed By: mityal
Differential Revision: D17715329
fbshipit-source-id: ec757927b
Summary:
This diff generates a symbolic value when a function returns only
exceptions. Previously, the exception expression is evaluated to top,
thus it was propagated to other functions, which made those costs as
top. For preventing that situation, this diff changed:
* exception expressions are evaluated to bottom, and
* if callee's return value is bottom, it generates a symbolic value
for it.
Reviewed By: ezgicicek
Differential Revision: D17500386
fbshipit-source-id: 0fdcc710d
Summary:
This diff is to refactoring some stuffs for the following diff.
* revised pretty print of the alias domain
* moved `eval_array_locs_length` to `BufferOverrunSemantics`.
Reviewed By: jvillard
Differential Revision: D17667123
fbshipit-source-id: c95611df5
Summary: `Prop(varArg = myProp) List <?> myPropList` can also be set via `myPropList()` or `myProp()`. Add support for picking up the `varArg` and checking this form of required props.
Reviewed By: ngorogiannis
Differential Revision: D17571997
fbshipit-source-id: 7956cb972
Summary: The analysis is not intra-procedural, hence we don't really read the payload. Let's remove it.
Reviewed By: ngorogiannis
Differential Revision: D17603911
fbshipit-source-id: c92b5c602
Summary:
This diff adds an eval mode for the substitutions of the cost results, in order to avoid precision
loss by joining two symbols.
The usual join of two different symbolic values, `s1` and `s2`, becomes top due to the limitation of
our domain. On the other hand, in the new eval mode, it returns an upperbound `s1+s2`, because the
cost values only care about the upperbounds.
Reviewed By: ezgicicek
Differential Revision: D17573400
fbshipit-source-id: 2c84743d5
Summary: When we have an annotation like `Prop(varArg = X)` or ` ThreadSafe(enableChecks = true)`, we were not able to pick up the names of the parameters like `varArg` or `enableChecks`. This diff fixes that.
Reviewed By: skcho, ngorogiannis
Differential Revision: D17571377
fbshipit-source-id: 5293b5810
Summary:
Instead of polluting the signature of trace endpoints, have
the call printer be a module argument to the functors
producing trace elements.
Reviewed By: skcho
Differential Revision: D17550111
fbshipit-source-id: ab5af94c6
Summary: Our annotation parameter parsing is too primitive to identify `resType` and before we only assumed that all Prop's can be set by any of the two suffixes: `Attr` and `Res`. After talking to Litho team, there is 3 more additions to these suffixes: `Dip`, `Sip`, and `Px`.
Reviewed By: ngorogiannis
Differential Revision: D17528482
fbshipit-source-id: 8d7f49130
Summary: Before, we were mistakenly checking any annotation that ends with Prop such as TreeProp. This was wrong. Instead, we should only check Prop as adviced by the Litho team.
Reviewed By: ngorogiannis
Differential Revision: D17527769
fbshipit-source-id: b753dd87a
Summary:
Introduce a new experimental checker (`--impurity`) that detects
impurity information, tracking which parameters and global variables
of a function are modified. The checker relies on Pulse to detect how
the state changes: it traverses the pre and post pairs starting from
the parameter/global variable and finds where the pre and post heaps
diverge. At diversion points, we expect to see WrittenTo/Invalid attributes
containing a trace of how the address was modified. We use these to
construct the trace of impurity.
This checker is a complement to the purity checker that exists mainly
for Java (and used for cost and loop-hoisting analyses). The aim of
this new experimental checker is to rely on Pulse's precise
memory treatment and come up with a more precise im(purity)
analysis. To distinguish the two checkers, we introduce a new issue
type `IMPURE_FUNCTION` that reports when a function is impure, rather
than when it is pure (as in the purity checker).
TODO:
- improve the analysis to rely on impurity information of external
library calls. Currently, all library calls are assumed to be nops,
hence pure.
- de-entangle Pulse reporting from analysis.
Reviewed By: skcho
Differential Revision: D17051567
fbshipit-source-id: 5e10afb4f
Summary: It prints debug information when top values is generated.
Reviewed By: ngorogiannis
Differential Revision: D17285448
fbshipit-source-id: 0621fd36d
Summary:
This check was an incomplete attempt to make nullsafe check nutritious
annotations for fields that get modified.
This was never fully productionized, and this check is turned off by
default.
In near future, we don't anticipate supporting this feature, so let's
remove it to simplify the code.
Reviewed By: artempyanykh
Differential Revision: D17282015
fbshipit-source-id: d63a2f1f7
Summary: This diff adds models of Java String. In order to keep the precision of cost checker, I fixed cost models for String in this diff too.
Reviewed By: ngorogiannis
Differential Revision: D17203309
fbshipit-source-id: 8cc2814fc
Summary:
This diff makes the checkers, except biabduction, to use `typ` instead
of `root_typ` of `Load`/`Store` statemetns.
Reviewed By: dulmarod
Differential Revision: D17203105
fbshipit-source-id: 8be9b5158
Summary:
It uses inline record for Sil.Load and Sil.Store for preparing the
following extention.
Reviewed By: dulmarod
Differential Revision: D17161288
fbshipit-source-id: 637ea7bfa
Summary:
`Present` annotation was an experiment made many years ago that never
got into real usage. The idea was to annotate Optional<> types with
Present, which means that it is safe to call get().
We don't plan to support `Present` annotation for optional types in the
near future.
Support of `Present` annotation requires extra levels of abstraction
that make the changing the behavior and introducing new features harder.
A lot of checks for nullability are written in generic way so they also
check for presense.
Getting rid of that will allow us to simplify our
work for introducing new semantics for nullsafe.
Reviewed By: ngorogiannis
Differential Revision: D17153432
fbshipit-source-id: c5ea9bdf1
Summary:
`from_string` is too benign in constrast with what this method is really
doing (and oh my what it is really doing).
There are a lot of potential follow ups to clean this up even more, but
this is beyond the scope of this diff
Reviewed By: jvillard
Differential Revision: D17070826
fbshipit-source-id: 3d190039e
Summary:
This is more powerful than `"symbols"` for more advanced use-cases. Keep
`"symbols"` unchanged to make migrating easier.
Differential Revision: D16985756
fbshipit-source-id: dfbb09393
Summary:
Change the logic of the annotation reachability checker in the following
ways:
1. Sanitizers take priority over sinks, i.e. a procedure that is both a
sink and a sanitizer is not a sink. This changes the existing tests
that seemed to assume the opposite. However I think that way is more
useful and goes better with the fact that sanitizers are specified as
"overrides".
2. When applying a summary, check again that we are not in a sanitizer
for the corresponding sink.
Without (2) this there was a subtle bug when several rules were
specified. For example, if `sink_wrapper()` wraps `sink()` for a rule
`R` then the summary of `sink_wrapper()` will be: `R-sink : call to sink()`.
Then, suppose `sanitizer()` calls `sink_wrapper()` and `sanitizer()` is
a sanitizer for `R` but not for another rule `R'`. The previous code
would add the call to `sink()` to the summary of `sanitizer()` because
it's not a sanitizer for `R'`, even though `sink()` is not a sink for
`R'`!
The current code will re-apply the rules correctly so that sinks are
matched only against the right sanitizers.
Reviewed By: skcho
Differential Revision: D16895577
fbshipit-source-id: 266cc4940
Summary:
- run the tests! they weren't hooked up to the main Makefile :/
- add some html debug messages
- formatting
Reviewed By: skcho
Differential Revision: D16895578
fbshipit-source-id: e96d737cc
Summary: In order to test changes to bigO notation, let's record them in test results.
Reviewed By: skcho
Differential Revision: D16763972
fbshipit-source-id: c1376909b
Summary:
Summary.ml defines both a bunch of types and how to use them and a
mechanism to save and store summaries on disk while maintaining a
complex in-memory cache of what's on disk. Make the distinction clear.
Reviewed By: ngorogiannis
Differential Revision: D16358869
fbshipit-source-id: 9d4c6cb77
Summary:
The domain supported path sensitivity wrt to a specific boolean guard `Branch.unlikely`. This isn't used in actual code, so remove it.
Also
- add an .mli to the domain;
- unabbreviate domain name to match analyser name;
- use Payload.read instead of calling Ondemand directly;
- adjust tests.
Reviewed By: mbouaziz
Differential Revision: D16203953
fbshipit-source-id: 743aa4400
Summary:
Treat `MainThread` and `WorkerThread` annotations.
Fix wrong test (`AnyThread` cannot call a UI-only method, because it can be called by ANY thread ;) ) See https://developer.android.com/reference/android/support/annotation/AnyThread
Clean up the code a bit.
Reviewed By: jvillard
Differential Revision: D16183798
fbshipit-source-id: 6b7e3b27e
Summary:
The fields `tenv` and `integer_type_widths` can be obtained from the `exe_env` field of `proc_callback_args`
This commit removes the redundant fields
Reviewed By: ngorogiannis
Differential Revision: D16149520
fbshipit-source-id: d37526fd4
Summary:
Supply the caller `Summary.t` to `Ondemand.analyze_proc_name` and `Ondemand.analyze_proc_desc` instead of the caller `Procdesc.t`
This change will enable a later commit to record the procedures that are called by a procedure in its summary
Reviewed By: ngorogiannis
Differential Revision: D16148677
fbshipit-source-id: cf353e89a
Summary:
Cluster checkers call `SummaryPayload.read` but set the `caller_summary` to correspond to the same summary as gives the `callee_pname`
This change introduces a new method `read_toplevel_procedure` that does not require a `caller_summary`, to be used by the cluster checkers
Reviewed By: ngorogiannis
Differential Revision: D16131660
fbshipit-source-id: 12caa1000
Summary: There were FNs caused by only looking for the immediate predecessors when we were checking the pattern. This diff heuristically chases 4 more predecessors to reduce the number of FNs.
Reviewed By: ngorogiannis
Differential Revision: D16149983
fbshipit-source-id: f65c57a0a
Summary: Adding typechecks to prevent potential FPs like the added test
Reviewed By: ngorogiannis
Differential Revision: D16149511
fbshipit-source-id: 6d3fe0ad4
Summary:
Change the datatype `ProcData` to include a field of type `Summary.t` instead of a field of type `Procdesc.t`
This will enable a later commit to supply a summary to `Ondemand.analyze_proc_desc` and `Ondemand.analyze_proc_name`
Reviewed By: ngorogiannis
Differential Revision: D16121405
fbshipit-source-id: 342374121
Summary:
The record `proc_callback_args` (defined in `callbacks.ml`) contains the fields `proc_desc` and `summary`.
The field `proc_desc` is redundant because it can be obtained from `summary`.
This diff removes `proc_desc` and uses the summary to obtain it where needed.
Reviewed By: ngorogiannis
Differential Revision: D16090783
fbshipit-source-id: 5632d1f4a
Summary:
Replaced by pulse. `--ownership` is now a deprecated form of `--pulse`.
The ownership checker is starting to give wrong answers due to changes in the
clang frontend, so it's better to remove it in favour of pulse.
there_goes_my_hero
Reviewed By: ngorogiannis
Differential Revision: D16107650
fbshipit-source-id: bb2446a19
Summary:
- Add allocation costs to `costs-report.json` and enable diffing over allocation costs.
- Also, let's be more consistent and modular in naming our cost issues.
- introduce a generic issue type `X_TIME_COMPLEXITY_INCREASE` where `X` can be one of the cost kinds. If the function is on the cold start, issue can have the `COLD_START` suffix. Similarly for infinite/zero/expensive calls.
- Change `PERFORMANCE_VARIATION` -> `EXECUTION_TIME_COMPLEXITY_INCREASE`
- Add new issue type for `ALLOCATION_COMPLEXITY_INCREASE_COLD_START` which will be enabled by default
- Refactor cost issues to be more modular and succinct. This also makes addition of a new cost kind very easy by adding the kind into the `enabled_cost_kinds` list in `CostKind.ml`
Reviewed By: mbouaziz
Differential Revision: D15822681
fbshipit-source-id: cf89ece59
Summary:
I realized that there was a discrepancy in the # of instructions between whether we run a single analysis or multiple analyses at the same time. It turns out that in biabduction, bufferoverrun and other HIL analyses we did Preanalysis step (which adds scope instructions and invokes liveness etc.) but not in others. This discrepancy results in inconsistent analysis results (e.g. in the new inefficient-keyset-iterator) that rely on instructions. We should be consistent. Hence, we now invoke Preanalysis in the frontend and remove all other uses in the rest of the checkers.
Consequently, I had to update the inefficient-keyset-checker to take the CFG resulting from Preanalysis with extra scoping instructions.
Reviewed By: mbouaziz, ngorogiannis, jvillard
Differential Revision: D15803492
fbshipit-source-id: 4e21eb610
Summary:
This is a simple checker that identifies inefficient uses of `keySet` iterator where (not only the key but also) the value is accessed via `get(key)`. It is more efficient to use `entrySet` iterator which already returns both key-value pairs. This optimization would get rid of many extra lookups which can be expensive.
We simply traverse the CFG starting from the loop head upwards and pick up the map that is iterated over. Then, we check in the loop nodes if there is a call to `get(...)` over this map. If, so we report.
Reviewed By: ngorogiannis
Differential Revision: D15737779
fbshipit-source-id: 702465b4e
Summary:
- Rename `invariantModels` to `purityModels`
- Track which arguments are modified in purity models. Before we were invalidating all arguments of impure modeled functions. Instead, now we only invalidate modified args given in the model. This should ideally result in more precision in the analysis.
- Add some more purity models for :`cast`, `new`, `new_array` and `Math.random`
Reviewed By: mbouaziz
Differential Revision: D15535332
fbshipit-source-id: 5395800d9
Summary:
- Makes sure that `start_session` and `finish_session` are well parenthesized
- Avoids a try finally when debug is disabled
Reviewed By: ngorogiannis
Differential Revision: D15371841
fbshipit-source-id: 340203edb
Summary:
Instead of emitting an ad-hoc builtin on variable declaration emit a new
metadata instruction. This allows us to remove the code matching on that
ad-hoc builtin that had to be inserted in several checkers.
Inferbo & pulse used that information meaningfully and had to undergo
some minor changes to cope with the new metada instruction.
Reviewed By: ezgicicek
Differential Revision: D14833100
fbshipit-source-id: 9b3009d22
Summary:
Bundle all non-semantic-bearing instructions into a `Metadata _`
instruction in SIL.
- On a documentation level this makes clearer the distinction between
instructions that encode the semantics of the program and those that are
just hints for the various backend analysis.
- This makes it easier to add more of these auxiliary instructions in
the future. For example, the next diff introduces a new `Skip` auxiliary
instruction to replace the hacky `ExitScope([], Location.dummy)`.
- It also makes it easier to surface all current and future such
auxiliary instructions to HIL as the datatype for these syntactic hints
can be shared between SIL and HIL. This diff brings `Nullify` and
`Abstract` to HIL for free.
Reviewed By: ngorogiannis
Differential Revision: D14827674
fbshipit-source-id: f68fe2110
Summary: It's all grown up now and taking quite some space in src/checkers/.
Reviewed By: skcho
Differential Revision: D14568273
fbshipit-source-id: b843c031e
Summary:
Context: "quandary" traces optimise for space by only storing a call site (plus analysis element) in a summary, as opposed to a list of call sites plus the element (i.e., a trace). When forming a report, the trace is expanded to a full one by reading the summary of the called function, and then matching up the current element with one from the summary, iterating until the trace cannot be expanded any more. In the best case, this can give a quadratic saving, as a real trace gets longer the higher one goes in the call stack, and therefore the total cost of saving that trace in each summary is quadratic in the length of the trace. Quandary traces give a linear cost.
HOWEVER, these have been a source of many subtle bugs.
1. The trace expansion strategy is very arbitrary and cannot distinguish between expanded traces that are invalid (i.e., end with a call and not an originating point, such as a field access in RacerD). Plus the strategy does not explore all expansions, just the left-most one, meaning the left most may be invalid in the above sense, but another (not left-most) isn't even though it's not discovered by the expansion. This is fixable with major surgery.
2. All real traces that lead to the same endpoint are conflated -- this is to save space because there may be exponentially many such traces. That's OK, but these traces may have different locking contexts -- one may take the lock along the way, and another may not. The expansion cannot make sure that if we are reporting a trace we have recorded as taking the lock, will actually do so. This has resulted in very confusing race reports that are superficially false positives (even though they point to the existence of a real race).
3. Expansion completely breaks down in the java/buck integration when the trace goes through f -> g -> h and f,g,h are all in distinct buck targets F,G,H and F does not depend directly on H. In that case, the summary of h is simply not available when reporting/expanding in f, so the expanded trace comes out as truncated and invalid. These are filtered out, but the filtering is buggy and kills real races too.
This diff completely replaces quandary traces in RacerD with plain explicit traces.
- This will incur the quadratic space/time cost previously saved. See test plan: there is indeed a 30% increase in summary size, but there is no slowdown. In fact, on openssl there is a 10-20% perf increase.
- For each endpoint, up to a single trace is used, as before, so no exponential explosion. However, because there is no such thing as expansion, we cannot get it wrong and change the locking context of a trace.
- This diff is emulating the previous reporting format as much as possible to allow good signal from the CI. Further diffs up this stack will remove quandary-trace specific things, and simplify further the code.
- 2 is not fully addressed -- it will require pushing the `AccessSnapshot` structure inside `TraceElem`. Further diffs.
Reviewed By: jberdine
Differential Revision: D14405600
fbshipit-source-id: d239117aa
Summary:
Before: the abstract state represents heap addresses as a single map
from addresses to edges + attributes.
After: the heap is made of 2 maps: one mapping addresses to edges, and
one mapping an address to its attributes.
It turns out that edges and attributes are often not updated at the same
time, so keeping them in the same map was causing pressure on the OCaml
gc.
Reviewed By: mbouaziz
Differential Revision: D14147991
fbshipit-source-id: 6713eeb3c
Summary:
This is basically unused except for debugging and is going to cause
issues later.
Reviewed By: mbouaziz
Differential Revision: D14258490
fbshipit-source-id: b2800990e
Summary:
When joining two lists of disjuncts we try to ensure there isn't a state
that under-approximates another already in the list. This helps reduce
the number of disjuncts that are generated by conditionals and loops.
Before we would always just add more disjuncts unless they were
physically equal but now we do a subgraph computation to assess
under-approximation.
We only do this half-heartedly for now however, only taking into
consideration the "new" disjuncts vs the "old" ones. It probably makes
sense to do a full quadratic search to minimise the number of disjuncts
from time to time but this isn't done here.
Reviewed By: mbouaziz
Differential Revision: D14258482
fbshipit-source-id: c2dad4889
Summary:
This removes the "abstract addresses" that used to be stored in the `Closure` attribute of pulse abstract addresses. There used to be a list of values recorded for each closure, each one representing one captured value. Instead these values are now recorded as fake edges in the memory graph.
Having addresses appear in attributes causes issues when trying to establish graph isomorphism between two memory states. Avoid it by rewriting the closures mechanism to encode captured addresses as fake edges in memory. This way captured addresses are automatically treated right by the graph algorithms (in the next diffs).
Reviewed By: mbouaziz
Differential Revision: D14323044
fbshipit-source-id: 413b4d989
Summary:
The disjunctive domain shouldn't really be a set in the first place as
comparing abstract states for equality is expensive to do naively
(walking the whole maps representing the abstract heap). Moreover in
practice these sets have a small max size (currently 50 for pulse, the
only client), so switching them to plain lists makes sense.
Reviewed By: mbouaziz
Differential Revision: D14258489
fbshipit-source-id: c512169eb
Summary:
It's useful to keep the size of states down, especially when humans are
trying to read it. It will also help keep the size of summaries down in
the inter-procedural pulse.
Reviewed By: mbouaziz
Differential Revision: D14258486
fbshipit-source-id: 45ebcac67