Summary:
bigmacro_bender
There are 3 ways pulse tracks history. This is at least one too many. So
far, we have:
1. "histories": a humble list of "events" like "assigned here", "returned from call", ...
2. "interproc actions": a structured nesting of calls with a final "action", eg "f calls g calls h which does blah"
3. "traces", which combine one history with one interproc action
This diff gets rid of interproc actions and makes histories include
"nested" callee histories too. This allows pulse to track and display
how a value got assigned across function calls.
Traces are now more powerful and interleave histories and interproc
actions. This allows pulse to track how a value is fed into an action,
for instance performed in callee, which itself creates some more
(potentially now interprocedural) history before going to the next step
of the action (either another call or the action itself).
This gives much better traces, and some examples are added to showcase
this.
There are a lot of changes when applying summaries to keep track of
histories more accurately than was done before, but also a few
simplifications that give additional evidence that this is the right
concept.
Reviewed By: skcho
Differential Revision: D17908942
fbshipit-source-id: 3b62eaf78
Summary:
Java method annotations are ambiguous in that there is no difference between
annotating the return value of a method, and annotating the method itself.
The disambiguation is done entirely based on the meaning of the annotation.
Here, while `UiThread`/`MainThread` are genuine method/class annotations
and not return annotations, the reverse is true for `ForUiThread`/`ForNonUiThread`.
This means that these latter annotations do not determine the thread status of
the method they are attached to.
Here we fix that misunderstanding.
Reviewed By: jvillard
Differential Revision: D17960994
fbshipit-source-id: 5aecfb124
Summary:
Instead of a string argument named `~str` pass `Formal | Global` and let
`add_to_errlog` figure out how to print it.
Reviewed By: ezgicicek
Differential Revision: D17907657
fbshipit-source-id: ed09aab72
Summary:
Let's add basic Java support to impurity checker. Since impurity checker relies on pulse, we need to add Java with Pulse callback as well. Pulse doesn't officially support Java yet, but we can enable it for impurity checker for now.
Many Java primitives/operations are not yet modeled (such as creation of new objects, support for collections etc.). Still, it is good to run impurity checker on the existing tests of the purity checker. Also, it is nice to see that we can identify most of the impure functions correctly in the purity dir. There are a lot of FNs though.
Reviewed By: skcho
Differential Revision: D17906237
fbshipit-source-id: 15308d285
Summary:
By some unfortunate logic, OCaml often decides to use
`sexp_list`/`sexp_option` instead of just `list`/`option`. Sometimes
these get copy/pasted in interface files.
It would be good to tell OCaml not to do that in the first place but in
the meantime: this diff.
Reviewed By: ngorogiannis
Differential Revision: D17907938
fbshipit-source-id: 7546834a2
Summary: If we have no pulse summary (most likely caused by pulse finding a legit issue with the code), let's consider the function as impure.
Reviewed By: jvillard
Differential Revision: D17906016
fbshipit-source-id: 671d3e0ba
Summary: This diff models the cost of `ImmutableSet.chooseTableSize(setSize)` as `O(log setSize)` and `construct(n, ...)` as `O(n)`.
Reviewed By: ezgicicek
Differential Revision: D17829850
fbshipit-source-id: 0ee318cc3
Summary:
[androidx.collection.SimpleArrayMap](https://developer.android.com/reference/androidx/collection/SimpleArrayMap.html) also has `keySet` and `entrySet` methods which make them eligible for inefficient keyset checker. Let's add it.
Title
Reviewed By: skcho
Differential Revision: D17831594
fbshipit-source-id: 32e831e18
Summary: Before, we didn't track litho framework callees on client code which was wrong. Now, we replace this with the following: If the callee is `build()` itself or doesn't contain a `build()` in its summary, then we want to track it in the domain. The former makes sense since we always want to track `build()` methods. The latter also makes sense since such a method could be a setter for a prop (as in the case of `prop1` in `buildPropLithoOK` which we were missing before due to the imprecise heuristic that prevented picking up callees in litho).
Reviewed By: ngorogiannis
Differential Revision: D17810704
fbshipit-source-id: 87d88e921
Summary:
Improve the trace by incorporating the callees and their locations in the call chain (i.e. chain of methods starting from `build()` call)
- extend the domain to contain the callee location
- replace the test results with the new traces
This makes our job much easier to debug FPs in a big codebase.
Reviewed By: skcho
Differential Revision: D17788996
fbshipit-source-id: 31938b5fe
Summary: `litho` checker contained two checkers: required-props and graphQL field accesses. Although they use the same domain, their reporting conditions and analysis details are different. However, they were bundled into the same analysis by adding disjunctions to `exec_instr` to handle both cases. Let's separate them into two different checkers, keeping a modular transfer function and analyzer that is reused by these two checkers.
Reviewed By: skcho
Differential Revision: D17788834
fbshipit-source-id: 47d77063b
Summary:
At some point it was thought that we can assume that any annotation starting with "On" means the method is on the UI thread.
That's too imprecise and has led to false positives and negatives. Restrict to a well-known safe set.
Reviewed By: ezgicicek
Differential Revision: D17769376
fbshipit-source-id: 0f8fee059
Summary:
The documentation and uses of filtering disagree. One typical usage is deduplication.
Split that where obvious, add comments where not obvious, and leave alone when obviously unrelated to deduplication.
Reviewed By: mityal
Differential Revision: D17715329
fbshipit-source-id: ec757927b
Summary:
This diff generates a symbolic value when a function returns only
exceptions. Previously, the exception expression is evaluated to top,
thus it was propagated to other functions, which made those costs as
top. For preventing that situation, this diff changed:
* exception expressions are evaluated to bottom, and
* if callee's return value is bottom, it generates a symbolic value
for it.
Reviewed By: ezgicicek
Differential Revision: D17500386
fbshipit-source-id: 0fdcc710d
Summary:
This diff is to refactoring some stuffs for the following diff.
* revised pretty print of the alias domain
* moved `eval_array_locs_length` to `BufferOverrunSemantics`.
Reviewed By: jvillard
Differential Revision: D17667123
fbshipit-source-id: c95611df5
Summary: `Prop(varArg = myProp) List <?> myPropList` can also be set via `myPropList()` or `myProp()`. Add support for picking up the `varArg` and checking this form of required props.
Reviewed By: ngorogiannis
Differential Revision: D17571997
fbshipit-source-id: 7956cb972
Summary: The analysis is not intra-procedural, hence we don't really read the payload. Let's remove it.
Reviewed By: ngorogiannis
Differential Revision: D17603911
fbshipit-source-id: c92b5c602
Summary:
This diff adds an eval mode for the substitutions of the cost results, in order to avoid precision
loss by joining two symbols.
The usual join of two different symbolic values, `s1` and `s2`, becomes top due to the limitation of
our domain. On the other hand, in the new eval mode, it returns an upperbound `s1+s2`, because the
cost values only care about the upperbounds.
Reviewed By: ezgicicek
Differential Revision: D17573400
fbshipit-source-id: 2c84743d5
Summary: When we have an annotation like `Prop(varArg = X)` or ` ThreadSafe(enableChecks = true)`, we were not able to pick up the names of the parameters like `varArg` or `enableChecks`. This diff fixes that.
Reviewed By: skcho, ngorogiannis
Differential Revision: D17571377
fbshipit-source-id: 5293b5810
Summary:
Instead of polluting the signature of trace endpoints, have
the call printer be a module argument to the functors
producing trace elements.
Reviewed By: skcho
Differential Revision: D17550111
fbshipit-source-id: ab5af94c6
Summary: Our annotation parameter parsing is too primitive to identify `resType` and before we only assumed that all Prop's can be set by any of the two suffixes: `Attr` and `Res`. After talking to Litho team, there is 3 more additions to these suffixes: `Dip`, `Sip`, and `Px`.
Reviewed By: ngorogiannis
Differential Revision: D17528482
fbshipit-source-id: 8d7f49130
Summary: Before, we were mistakenly checking any annotation that ends with Prop such as TreeProp. This was wrong. Instead, we should only check Prop as adviced by the Litho team.
Reviewed By: ngorogiannis
Differential Revision: D17527769
fbshipit-source-id: b753dd87a
Summary:
Introduce a new experimental checker (`--impurity`) that detects
impurity information, tracking which parameters and global variables
of a function are modified. The checker relies on Pulse to detect how
the state changes: it traverses the pre and post pairs starting from
the parameter/global variable and finds where the pre and post heaps
diverge. At diversion points, we expect to see WrittenTo/Invalid attributes
containing a trace of how the address was modified. We use these to
construct the trace of impurity.
This checker is a complement to the purity checker that exists mainly
for Java (and used for cost and loop-hoisting analyses). The aim of
this new experimental checker is to rely on Pulse's precise
memory treatment and come up with a more precise im(purity)
analysis. To distinguish the two checkers, we introduce a new issue
type `IMPURE_FUNCTION` that reports when a function is impure, rather
than when it is pure (as in the purity checker).
TODO:
- improve the analysis to rely on impurity information of external
library calls. Currently, all library calls are assumed to be nops,
hence pure.
- de-entangle Pulse reporting from analysis.
Reviewed By: skcho
Differential Revision: D17051567
fbshipit-source-id: 5e10afb4f
Summary: It prints debug information when top values is generated.
Reviewed By: ngorogiannis
Differential Revision: D17285448
fbshipit-source-id: 0621fd36d
Summary:
This check was an incomplete attempt to make nullsafe check nutritious
annotations for fields that get modified.
This was never fully productionized, and this check is turned off by
default.
In near future, we don't anticipate supporting this feature, so let's
remove it to simplify the code.
Reviewed By: artempyanykh
Differential Revision: D17282015
fbshipit-source-id: d63a2f1f7
Summary: This diff adds models of Java String. In order to keep the precision of cost checker, I fixed cost models for String in this diff too.
Reviewed By: ngorogiannis
Differential Revision: D17203309
fbshipit-source-id: 8cc2814fc
Summary:
This diff makes the checkers, except biabduction, to use `typ` instead
of `root_typ` of `Load`/`Store` statemetns.
Reviewed By: dulmarod
Differential Revision: D17203105
fbshipit-source-id: 8be9b5158
Summary:
It uses inline record for Sil.Load and Sil.Store for preparing the
following extention.
Reviewed By: dulmarod
Differential Revision: D17161288
fbshipit-source-id: 637ea7bfa
Summary:
`Present` annotation was an experiment made many years ago that never
got into real usage. The idea was to annotate Optional<> types with
Present, which means that it is safe to call get().
We don't plan to support `Present` annotation for optional types in the
near future.
Support of `Present` annotation requires extra levels of abstraction
that make the changing the behavior and introducing new features harder.
A lot of checks for nullability are written in generic way so they also
check for presense.
Getting rid of that will allow us to simplify our
work for introducing new semantics for nullsafe.
Reviewed By: ngorogiannis
Differential Revision: D17153432
fbshipit-source-id: c5ea9bdf1
Summary:
`from_string` is too benign in constrast with what this method is really
doing (and oh my what it is really doing).
There are a lot of potential follow ups to clean this up even more, but
this is beyond the scope of this diff
Reviewed By: jvillard
Differential Revision: D17070826
fbshipit-source-id: 3d190039e
Summary:
This is more powerful than `"symbols"` for more advanced use-cases. Keep
`"symbols"` unchanged to make migrating easier.
Differential Revision: D16985756
fbshipit-source-id: dfbb09393
Summary:
Change the logic of the annotation reachability checker in the following
ways:
1. Sanitizers take priority over sinks, i.e. a procedure that is both a
sink and a sanitizer is not a sink. This changes the existing tests
that seemed to assume the opposite. However I think that way is more
useful and goes better with the fact that sanitizers are specified as
"overrides".
2. When applying a summary, check again that we are not in a sanitizer
for the corresponding sink.
Without (2) this there was a subtle bug when several rules were
specified. For example, if `sink_wrapper()` wraps `sink()` for a rule
`R` then the summary of `sink_wrapper()` will be: `R-sink : call to sink()`.
Then, suppose `sanitizer()` calls `sink_wrapper()` and `sanitizer()` is
a sanitizer for `R` but not for another rule `R'`. The previous code
would add the call to `sink()` to the summary of `sanitizer()` because
it's not a sanitizer for `R'`, even though `sink()` is not a sink for
`R'`!
The current code will re-apply the rules correctly so that sinks are
matched only against the right sanitizers.
Reviewed By: skcho
Differential Revision: D16895577
fbshipit-source-id: 266cc4940
Summary:
- run the tests! they weren't hooked up to the main Makefile :/
- add some html debug messages
- formatting
Reviewed By: skcho
Differential Revision: D16895578
fbshipit-source-id: e96d737cc
Summary: In order to test changes to bigO notation, let's record them in test results.
Reviewed By: skcho
Differential Revision: D16763972
fbshipit-source-id: c1376909b
Summary:
Summary.ml defines both a bunch of types and how to use them and a
mechanism to save and store summaries on disk while maintaining a
complex in-memory cache of what's on disk. Make the distinction clear.
Reviewed By: ngorogiannis
Differential Revision: D16358869
fbshipit-source-id: 9d4c6cb77
Summary:
The domain supported path sensitivity wrt to a specific boolean guard `Branch.unlikely`. This isn't used in actual code, so remove it.
Also
- add an .mli to the domain;
- unabbreviate domain name to match analyser name;
- use Payload.read instead of calling Ondemand directly;
- adjust tests.
Reviewed By: mbouaziz
Differential Revision: D16203953
fbshipit-source-id: 743aa4400
Summary:
Treat `MainThread` and `WorkerThread` annotations.
Fix wrong test (`AnyThread` cannot call a UI-only method, because it can be called by ANY thread ;) ) See https://developer.android.com/reference/android/support/annotation/AnyThread
Clean up the code a bit.
Reviewed By: jvillard
Differential Revision: D16183798
fbshipit-source-id: 6b7e3b27e
Summary:
The fields `tenv` and `integer_type_widths` can be obtained from the `exe_env` field of `proc_callback_args`
This commit removes the redundant fields
Reviewed By: ngorogiannis
Differential Revision: D16149520
fbshipit-source-id: d37526fd4
Summary:
Supply the caller `Summary.t` to `Ondemand.analyze_proc_name` and `Ondemand.analyze_proc_desc` instead of the caller `Procdesc.t`
This change will enable a later commit to record the procedures that are called by a procedure in its summary
Reviewed By: ngorogiannis
Differential Revision: D16148677
fbshipit-source-id: cf353e89a
Summary:
Cluster checkers call `SummaryPayload.read` but set the `caller_summary` to correspond to the same summary as gives the `callee_pname`
This change introduces a new method `read_toplevel_procedure` that does not require a `caller_summary`, to be used by the cluster checkers
Reviewed By: ngorogiannis
Differential Revision: D16131660
fbshipit-source-id: 12caa1000
Summary: There were FNs caused by only looking for the immediate predecessors when we were checking the pattern. This diff heuristically chases 4 more predecessors to reduce the number of FNs.
Reviewed By: ngorogiannis
Differential Revision: D16149983
fbshipit-source-id: f65c57a0a
Summary: Adding typechecks to prevent potential FPs like the added test
Reviewed By: ngorogiannis
Differential Revision: D16149511
fbshipit-source-id: 6d3fe0ad4
Summary:
Change the datatype `ProcData` to include a field of type `Summary.t` instead of a field of type `Procdesc.t`
This will enable a later commit to supply a summary to `Ondemand.analyze_proc_desc` and `Ondemand.analyze_proc_name`
Reviewed By: ngorogiannis
Differential Revision: D16121405
fbshipit-source-id: 342374121
Summary:
The record `proc_callback_args` (defined in `callbacks.ml`) contains the fields `proc_desc` and `summary`.
The field `proc_desc` is redundant because it can be obtained from `summary`.
This diff removes `proc_desc` and uses the summary to obtain it where needed.
Reviewed By: ngorogiannis
Differential Revision: D16090783
fbshipit-source-id: 5632d1f4a
Summary:
Replaced by pulse. `--ownership` is now a deprecated form of `--pulse`.
The ownership checker is starting to give wrong answers due to changes in the
clang frontend, so it's better to remove it in favour of pulse.
there_goes_my_hero
Reviewed By: ngorogiannis
Differential Revision: D16107650
fbshipit-source-id: bb2446a19
Summary:
- Add allocation costs to `costs-report.json` and enable diffing over allocation costs.
- Also, let's be more consistent and modular in naming our cost issues.
- introduce a generic issue type `X_TIME_COMPLEXITY_INCREASE` where `X` can be one of the cost kinds. If the function is on the cold start, issue can have the `COLD_START` suffix. Similarly for infinite/zero/expensive calls.
- Change `PERFORMANCE_VARIATION` -> `EXECUTION_TIME_COMPLEXITY_INCREASE`
- Add new issue type for `ALLOCATION_COMPLEXITY_INCREASE_COLD_START` which will be enabled by default
- Refactor cost issues to be more modular and succinct. This also makes addition of a new cost kind very easy by adding the kind into the `enabled_cost_kinds` list in `CostKind.ml`
Reviewed By: mbouaziz
Differential Revision: D15822681
fbshipit-source-id: cf89ece59
Summary:
I realized that there was a discrepancy in the # of instructions between whether we run a single analysis or multiple analyses at the same time. It turns out that in biabduction, bufferoverrun and other HIL analyses we did Preanalysis step (which adds scope instructions and invokes liveness etc.) but not in others. This discrepancy results in inconsistent analysis results (e.g. in the new inefficient-keyset-iterator) that rely on instructions. We should be consistent. Hence, we now invoke Preanalysis in the frontend and remove all other uses in the rest of the checkers.
Consequently, I had to update the inefficient-keyset-checker to take the CFG resulting from Preanalysis with extra scoping instructions.
Reviewed By: mbouaziz, ngorogiannis, jvillard
Differential Revision: D15803492
fbshipit-source-id: 4e21eb610
Summary:
This is a simple checker that identifies inefficient uses of `keySet` iterator where (not only the key but also) the value is accessed via `get(key)`. It is more efficient to use `entrySet` iterator which already returns both key-value pairs. This optimization would get rid of many extra lookups which can be expensive.
We simply traverse the CFG starting from the loop head upwards and pick up the map that is iterated over. Then, we check in the loop nodes if there is a call to `get(...)` over this map. If, so we report.
Reviewed By: ngorogiannis
Differential Revision: D15737779
fbshipit-source-id: 702465b4e
Summary:
- Rename `invariantModels` to `purityModels`
- Track which arguments are modified in purity models. Before we were invalidating all arguments of impure modeled functions. Instead, now we only invalidate modified args given in the model. This should ideally result in more precision in the analysis.
- Add some more purity models for :`cast`, `new`, `new_array` and `Math.random`
Reviewed By: mbouaziz
Differential Revision: D15535332
fbshipit-source-id: 5395800d9
Summary:
- Makes sure that `start_session` and `finish_session` are well parenthesized
- Avoids a try finally when debug is disabled
Reviewed By: ngorogiannis
Differential Revision: D15371841
fbshipit-source-id: 340203edb
Summary:
Instead of emitting an ad-hoc builtin on variable declaration emit a new
metadata instruction. This allows us to remove the code matching on that
ad-hoc builtin that had to be inserted in several checkers.
Inferbo & pulse used that information meaningfully and had to undergo
some minor changes to cope with the new metada instruction.
Reviewed By: ezgicicek
Differential Revision: D14833100
fbshipit-source-id: 9b3009d22
Summary:
Bundle all non-semantic-bearing instructions into a `Metadata _`
instruction in SIL.
- On a documentation level this makes clearer the distinction between
instructions that encode the semantics of the program and those that are
just hints for the various backend analysis.
- This makes it easier to add more of these auxiliary instructions in
the future. For example, the next diff introduces a new `Skip` auxiliary
instruction to replace the hacky `ExitScope([], Location.dummy)`.
- It also makes it easier to surface all current and future such
auxiliary instructions to HIL as the datatype for these syntactic hints
can be shared between SIL and HIL. This diff brings `Nullify` and
`Abstract` to HIL for free.
Reviewed By: ngorogiannis
Differential Revision: D14827674
fbshipit-source-id: f68fe2110
Summary: It's all grown up now and taking quite some space in src/checkers/.
Reviewed By: skcho
Differential Revision: D14568273
fbshipit-source-id: b843c031e
Summary:
Context: "quandary" traces optimise for space by only storing a call site (plus analysis element) in a summary, as opposed to a list of call sites plus the element (i.e., a trace). When forming a report, the trace is expanded to a full one by reading the summary of the called function, and then matching up the current element with one from the summary, iterating until the trace cannot be expanded any more. In the best case, this can give a quadratic saving, as a real trace gets longer the higher one goes in the call stack, and therefore the total cost of saving that trace in each summary is quadratic in the length of the trace. Quandary traces give a linear cost.
HOWEVER, these have been a source of many subtle bugs.
1. The trace expansion strategy is very arbitrary and cannot distinguish between expanded traces that are invalid (i.e., end with a call and not an originating point, such as a field access in RacerD). Plus the strategy does not explore all expansions, just the left-most one, meaning the left most may be invalid in the above sense, but another (not left-most) isn't even though it's not discovered by the expansion. This is fixable with major surgery.
2. All real traces that lead to the same endpoint are conflated -- this is to save space because there may be exponentially many such traces. That's OK, but these traces may have different locking contexts -- one may take the lock along the way, and another may not. The expansion cannot make sure that if we are reporting a trace we have recorded as taking the lock, will actually do so. This has resulted in very confusing race reports that are superficially false positives (even though they point to the existence of a real race).
3. Expansion completely breaks down in the java/buck integration when the trace goes through f -> g -> h and f,g,h are all in distinct buck targets F,G,H and F does not depend directly on H. In that case, the summary of h is simply not available when reporting/expanding in f, so the expanded trace comes out as truncated and invalid. These are filtered out, but the filtering is buggy and kills real races too.
This diff completely replaces quandary traces in RacerD with plain explicit traces.
- This will incur the quadratic space/time cost previously saved. See test plan: there is indeed a 30% increase in summary size, but there is no slowdown. In fact, on openssl there is a 10-20% perf increase.
- For each endpoint, up to a single trace is used, as before, so no exponential explosion. However, because there is no such thing as expansion, we cannot get it wrong and change the locking context of a trace.
- This diff is emulating the previous reporting format as much as possible to allow good signal from the CI. Further diffs up this stack will remove quandary-trace specific things, and simplify further the code.
- 2 is not fully addressed -- it will require pushing the `AccessSnapshot` structure inside `TraceElem`. Further diffs.
Reviewed By: jberdine
Differential Revision: D14405600
fbshipit-source-id: d239117aa
Summary:
Before: the abstract state represents heap addresses as a single map
from addresses to edges + attributes.
After: the heap is made of 2 maps: one mapping addresses to edges, and
one mapping an address to its attributes.
It turns out that edges and attributes are often not updated at the same
time, so keeping them in the same map was causing pressure on the OCaml
gc.
Reviewed By: mbouaziz
Differential Revision: D14147991
fbshipit-source-id: 6713eeb3c
Summary:
This is basically unused except for debugging and is going to cause
issues later.
Reviewed By: mbouaziz
Differential Revision: D14258490
fbshipit-source-id: b2800990e
Summary:
When joining two lists of disjuncts we try to ensure there isn't a state
that under-approximates another already in the list. This helps reduce
the number of disjuncts that are generated by conditionals and loops.
Before we would always just add more disjuncts unless they were
physically equal but now we do a subgraph computation to assess
under-approximation.
We only do this half-heartedly for now however, only taking into
consideration the "new" disjuncts vs the "old" ones. It probably makes
sense to do a full quadratic search to minimise the number of disjuncts
from time to time but this isn't done here.
Reviewed By: mbouaziz
Differential Revision: D14258482
fbshipit-source-id: c2dad4889
Summary:
This removes the "abstract addresses" that used to be stored in the `Closure` attribute of pulse abstract addresses. There used to be a list of values recorded for each closure, each one representing one captured value. Instead these values are now recorded as fake edges in the memory graph.
Having addresses appear in attributes causes issues when trying to establish graph isomorphism between two memory states. Avoid it by rewriting the closures mechanism to encode captured addresses as fake edges in memory. This way captured addresses are automatically treated right by the graph algorithms (in the next diffs).
Reviewed By: mbouaziz
Differential Revision: D14323044
fbshipit-source-id: 413b4d989
Summary:
The disjunctive domain shouldn't really be a set in the first place as
comparing abstract states for equality is expensive to do naively
(walking the whole maps representing the abstract heap). Moreover in
practice these sets have a small max size (currently 50 for pulse, the
only client), so switching them to plain lists makes sense.
Reviewed By: mbouaziz
Differential Revision: D14258489
fbshipit-source-id: c512169eb
Summary:
It's useful to keep the size of states down, especially when humans are
trying to read it. It will also help keep the size of summaries down in
the inter-procedural pulse.
Reviewed By: mbouaziz
Differential Revision: D14258486
fbshipit-source-id: 45ebcac67
Summary:
- docstrings
- mli
- split `get_control_maps`: `get_loop_head_to_source_nodes` is used both by Cost and Hoisting. If using both analyzers, it is called twice whereas it could be shared (which is done later in the stack of diffs).
Reviewed By: ezgicicek
Differential Revision: D14258372
fbshipit-source-id: 29addddb7
Summary:
:
Since traces are attached to symbols, currently it will make no difference.
Calling `subst` on `Top` or on constant is constant-time.
But I need this to record `Call` trace elements for `Top`.
Reviewed By: ezgicicek
Differential Revision: D14249265
fbshipit-source-id: d3aa4ac9e
Summary: We don't want to use Cost analysis results when `Config.hoisting_report_only_expensive` is false
Reviewed By: ezgicicek
Differential Revision: D14124555
fbshipit-source-id: e809bb80a
Summary: Record where each symbol in a polynomial is coming from: either a loop, function call or a modeled call.
Reviewed By: mbouaziz
Differential Revision: D14047420
fbshipit-source-id: 56d0bd926
Summary:
- Decouple analysis/reporting a little bit
- Avoids carrying the summary while computing stuff
Depends on D14028249
Reviewed By: ngorogiannis
Differential Revision: D14028673
fbshipit-source-id: 18e7298f8
Summary:
`AnalyzerNodesBasicCost` is just mapping instructions to abstract costs, it doesn't need to use AI.
Also it was keeping a map (node -> cost) for each node, this is completely removed.
Depends on D14028171
Reviewed By: ddino
Differential Revision: D14028249
fbshipit-source-id: 63f39261a
Summary:
- There is no need to use AI to compute a dot product: let's just fold over all nodes, but still do it in order (using the WTO) to report at the right place
- The previous version was computing a dot product on nodes for each node, which was quadratic, the new version is linear
- Report only once, the first time the threshold is reached (if in a loop, report at the loop head)
Reviewed By: ddino
Differential Revision: D14028171
fbshipit-source-id: b4a840c6e
Summary:
Add an option to specify some classes that we really want to warn about
with the liveness checker, even when they appear used because of the
implicit destructor call inserted by the compiler.
Reviewed By: mbouaziz
Differential Revision: D13991129
fbshipit-source-id: 7fafdba84
Summary: When a `VarDecl` has the attribute `unused` then do not assign its initialisation result to the corresponding variable.
Reviewed By: ngorogiannis
Differential Revision: D13974497
fbshipit-source-id: 28029f995
Summary:
This will allow to get the numerical results for Cost, Hoisting, Purity without the Inferbo issues.
For now, I still forced Inferbo issues for Cost and Purity to avoid lots of changes in tests, that will go away soon.
Reviewed By: ezgicicek, skcho
Differential Revision: D13826741
fbshipit-source-id: 796d1a50d
Summary:
This will allow disjunctive analyzers to return sets of states as a
result instead of always returning one state. More precisely, this will
be needed for pulse when it becomes inter-procedural, if we take
summaries of functions to be disjunctive too (like, e.g., biabduction
does with several specs per function).
Reviewed By: mbouaziz
Differential Revision: D13537601
fbshipit-source-id: f54caf802
Summary:
Split into:
- `PulseDiagnostic`, formerly `PulseDomain.Diagnostic`
- `PulseOperations`, formerly `PulseDomain.Operations`
This breaks down the now quite large and complex PulseDomain.ml into
more manageable pieces. More importantly, it will allow us to build a
bigger pulse domain later, where elements of the domain are pairs of the
base domain that include a biabductive "footprint".
What's not as nice is that more of the interface of `PulseDomain` is
exposed, in particular `PulseDomain.Memory` and `PulseDomain.Stack`.
We'll have to be careful not to break abstraction barriers and prefer
`PulseOperations` to `PulseDomain` outside of the domain implementation.
OCaml forces us to do that because of the multi-file approach. It could
be solved by introducing pulse domains as a library but who has time for
that...
Sending early because rebasing that diff is painful.
Reviewed By: ngorogiannis
Differential Revision: D13537602
fbshipit-source-id: d211d6e84
Summary:
Record per-location traces. Actually, that doesn't quite make sense as a
location can be accessed in many ways, so associate a trace to each
*edge* in the memory graph. For instance, when doing `x->f = *y`, we
want to take the history of the `<val of y> --*--> ..` edge, add "assigned
at location blah" to it and store this extended history to the edge
`<val of x> --f--> ..`.
Use this machinery to print nicer traces in `infer explore` and better
error messages too (include the last assignment, like biabduction
messages).
Reviewed By: da319
Differential Revision: D13518668
fbshipit-source-id: 0a62fb55f
Summary: In the `operator=` case that assigns from a temporary, we want to assign an object of a temporary not it's address (as a comment already says)
Reviewed By: jvillard
Differential Revision: D13518496
fbshipit-source-id: 72bd23623
Summary:
When a C++ temporary goes out of scope, tag its address in the heap with
a new attribute `AddressOfCppTemporary` so that we can later check that
we don't return it.
Reviewed By: da319
Differential Revision: D13466898
fbshipit-source-id: 8808338b4
Summary:
When assign to the special `return` variable, check that the result is
not the address of a local variable, otherwise report.
Reviewed By: ngorogiannis
Differential Revision: D13466896
fbshipit-source-id: 465da7f13
Summary:
It's ok to take an address of a field / array access of an invalid object.
This diff calculates the inner most dereference for an access expression starting with `&` and does not report on the dereference even if the address is invalid.
Reviewed By: jvillard
Differential Revision: D13450758
fbshipit-source-id: 18c038701
Summary:
When we create Dereference edge, we also create TakeAddress back edge. This causes false positives for stack variables. When we write to a stack variable and then take its address, the resulting address is the one from the back edge of the written value. See example `push_back_value_ok`. To solve this issue, this diff changes stack to denote a map from address of variables rather than from variables.
We still have issue for fields, see example, FP_push_back_value_field_ok. To solve this, we probably need to remove back edges.
Reviewed By: jvillard
Differential Revision: D13432415
fbshipit-source-id: 9254a1a6d
Summary: Mostly a revert of D13190876 once the disjunctive domain is in place.
Reviewed By: da319
Differential Revision: D13432488
fbshipit-source-id: f1e98ef0d
Summary:
Change join/widen policies to more interesting ones and play around to
find a good tradeoff.
Reviewed By: mbouaziz
Differential Revision: D13432492
fbshipit-source-id: 2c3e498dd
Summary:
Introduce machinery to do disjunctive HIL domains and use it for pulse,
but only in a mode that preserves the existing behaviour.
The disjunctive domain is a functor that turns any (HIL for now)
transfer function module into one operating on sets of elements of the
original domain. The behaviour of joins (and widenings, which are equal
to joins) can be chosen when instantiating the functor among 3
behaviours:
- `` `JoinAfter n`: when the set of disjuncts gets bigger than `n` the
underlying domain's join is called to collapse them into one state
- `` `UnderApproximateAfter n`: when the sest of disjuncts gets bigger
than `n` then just stop adding new states to it, drop any further states
on the floor. This corresponds to an under-approximation/bounded
approach.
- `` `NeverJoin`
The widening is always of the form ``
`UnderApproximateAfterNumIterations max_iter` for now since the only
user is pulse and I'm not sure what else would be useful.
Picking `` `JoinAfter 0` gives the same results as the non-disjunctive
domain since the underlying `join` will always be called. Make pulse use
this mode for now, and tune it in a next diff.
Reviewed By: mbouaziz
Differential Revision: D13431375
fbshipit-source-id: b93aa50e7
Summary:
This will be useful to make the analysis more precise. In particular, it
allows a disjunctive version of pulse to deal will deleting vector
elements in a loop: without this, deleting an array element in one
iteration will make the analysis think that the next array element is
invalid too since they are all the same. By keep track of the index, we
can detect when we are sure that two elements are the same and only
report in that case.
Reviewed By: ngorogiannis
Differential Revision: D13431374
fbshipit-source-id: dae82deeb
Summary:
A lot of functors that take a `Make{SIL,HIL}` can take a `{SIL,HIL}`
directly instead. This makes my head hurt a bit less.
Reviewed By: mbouaziz
Differential Revision: D13416967
fbshipit-source-id: eb0b33bc4
Summary:
When a lambda gets created, record the abstract addresses it captures, then
complain if we see some of them be invalidated before it is called.
Add a notion of "allocator" for reporting better messages. The messages are
still a bit sucky, will need to improve them more generally at some point.
```
jul lambda ~ infer 1 infer -g --pulse-only -- clang -std=c++11 -c infer/tests/codetoanalyze/cpp/pulse/closures.cpp
Logs in /home/jul/infer.fb/infer-out/logs
Capturing in make/cc mode...
Found 1 source file to analyze in /home/jul/infer.fb/infer-out
Found 2 issues
infer/tests/codetoanalyze/cpp/pulse/closures.cpp:21: error: USE_AFTER_DESTRUCTOR
`&(f)` accesses address `s` captured by `&(f)` as `s` invalidated by destructor call `S_~S(s)` at line 20, column 3 past its lifetime (debug: 5).
19. f = [&s] { return s.f; };
20. } // destructor for s called here
21. > return f(); // s used here
22. }
23.
infer/tests/codetoanalyze/cpp/pulse/closures.cpp:30: error: USE_AFTER_DESTRUCTOR
`&(f)` accesses address `s` captured by `&(f)` as `s` invalidated by destructor call `S_~S(s)` at line 29, column 3 past its lifetime (debug: 8).
28. f = [&] { return s.f; };
29. }
30. > return f();
31. }
32.
Summary of the reports
USE_AFTER_DESTRUCTOR: 2
```
Reviewed By: da319
Differential Revision: D13400074
fbshipit-source-id: 3c68ff4ea
Summary: Model more `std::vector` functions that can potentially invalidate references to vector's elements (https://en.cppreference.com/w/cpp/container/vector).
Reviewed By: mbouaziz
Differential Revision: D13399161
fbshipit-source-id: 95cf2cae6
Summary:
Some code calls `this->~Obj()` then proceeds to use fields in the current
object, which previously we would report as invalid uses. Assume people know
what they are doing and ignore destructor calls to `this`.
Reviewed By: mbouaziz
Differential Revision: D13401145
fbshipit-source-id: f6b0fb6ec
Summary:
`AccessExpression.t` represents array accesses as `ArrayOffset of t * Typ.t * t
list`, i.e. the index is represented by a list of access expressions. This is
not precise enough when indices cannot be represented as such. In fact, in
general any `HilExp.t` can be an array index but this type was an approximation
that was good enough for existing checkers based on HIL.
This diff changes the type of access expressions to be parametric in the type
of array offsets, and uses this to record `HilExp.t` into them when translating
from SIL to HIL.
To accomodate the option of not caring about array offsets
(`include_array_indexes=false`), the type of array offsets is an option type.
Reviewed By: mbouaziz
Differential Revision: D13360944
fbshipit-source-id: b01442459
Summary:
`AccessExpression.t` and `HilExp.t` are about to become mutually
recursive, this will help distinguish the actual changes from the moving
of code around.
This deletes the file left around in the previous commit to preserve
callers of `AccessExpression`.
Reviewed By: mbouaziz
Differential Revision: D13377645
fbshipit-source-id: 71338d1f3
Summary:
`AccessExpression.t` and `HilExp.t` are about to become mutually
recursive, this will help distinguish the actual changes from the moving
of code around.
Reviewed By: mbouaziz
Differential Revision: D13377644
fbshipit-source-id: 9d6f290b6
Summary:
This will be useful for proper support of array indexes in pulse and in
HIL in general.
Reviewed By: mbouaziz
Differential Revision: D13377642
fbshipit-source-id: e431121fb
Summary: Similarly as `std::vector::push_back`, `std::vector::reserve` can invalidate the references to elements if the new size is bigger than the existing one. More info on `std::vector::reserve`: https://en.cppreference.com/w/cpp/container/vector/reserve
Reviewed By: jvillard
Differential Revision: D13340324
fbshipit-source-id: bf99b6923
Summary: Moving all the files related to nullable type checking under the same directory. The goal is to merge everything into the same backend based on the AI framework and access expressions.
Reviewed By: ngorogiannis
Differential Revision: D13350880
fbshipit-source-id: 8ab3cf81b
Summary: Instead of variable having the value of a single location on stack, we now allow variables to have multiple locations. Consequently, we also allow a memory location to point to a set of locations in the heap. We enforce a limit on a maximum number of locations in a set (currently 5).
Reviewed By: jvillard
Differential Revision: D13190876
fbshipit-source-id: 5cb5ba9a6
Summary: It is not used yet and still manages to cause false positives.
Reviewed By: mbouaziz
Differential Revision: D13102948
fbshipit-source-id: 2122666c2
Summary:
It's useful for checkers to know when variables go out of scope to
perform garbage collection in their domains, especially for complex
domains with non-trivial joins. This makes the analyses more precise at
little cost.
This could have been added as a custom function call to a builtin, but I
decided against it because this instruction doesn't have the semantics
of any function call. It's better for each checker to explicitly not
deal with the custom instruction instead.
Reviewed By: jberdine
Differential Revision: D13102951
fbshipit-source-id: 33be22fab
Summary:
Before, the liveness pre-analysis would place extra instructions in the
CFG for either:
1. marking an `Ident.t` as dead, or
2. marking a `Pvar.t` as `= 0`
But we have no way of marking pvars dead without setting them to 0. This
is bad because setting pvars to 0 is not possible everywhere they are
dead. Indeed, we only do it when we haven't seen their address being
taken anyway. This prevents the following situation, recorded in our tests:
```
int address_taken() {
int** x;
int* y;
int i = 7;
y = &i;
x = &y;
// if we don't reason about taken addresses while adding nullify instructions,
// we'll add
// `nullify(y)` here and report a false NPE on the next line
return **x;
}
```
So we want to mark pvars as dead without nullifying them. This diff
extends the `Remove_temps` SIL instruction to accept pvars as well, and
so renames it to `ExitScope`.
Reviewed By: da319
Differential Revision: D13102953
fbshipit-source-id: aa7f03a52
Summary: In this diff, it passes the parameter of integer type widths to evaluation functions. The parameter which will be used for casting in the following diff.
Reviewed By: mbouaziz
Differential Revision: D12920581
fbshipit-source-id: 48bbc802b
Summary:
It enables the translation of casting expression. As of now, it
translates only the castings of pointers to integer types, in order to
avoid too much of change, which may mess the checkers up.
Reviewed By: jvillard
Differential Revision: D12920568
fbshipit-source-id: a5489df24
Summary:
It turns out keeping attributes (such as invalidation facts) separate
from the memory is a bad idea and leads to loss of precision and false
positives, as seen in the new test (which previously generated a
report).
Allow me to illustrate on this example, which is a stylised version of
the issue in the added test: previously we'd have:
```
state1 = { x = 1; invalids={} }
state2 = { x = 2; invalids ={1} }
join(state1, state2) = { x = {1, 2}; invalids={{1, 2}} }
```
So even though none of the states said that `x` pointed to an invalid
location, the join state says it does because `1` and `2` have been
glommed together. The fact `x=1` from `state1` and the fact "1 is
invalid" from `state2` conspire together and `x` is now invalid even
though it shouldn't.
Instead, if we record attributes as part of the memory we get that `x`
is still valid after the join:
```
state1 = { x = (1, {}) }
state2 = { x = (2, {}) }
join(state1, state2) = { x = ({1, 2}, {}) }
```
Reviewed By: mbouaziz
Differential Revision: D12958130
fbshipit-source-id: 53dc81cc7
Summary:
I hear that this scheduler is better. I want the best scheduler
possible. Also pulse's join is a bit complex so it might matter one day.
whydididothis
Reviewed By: mbouaziz
Differential Revision: D12958131
fbshipit-source-id: 3bd77ccba
Summary: For a general case of `operator=` we want to create a fresh location for the first parameter as `operator=` behaves as copy assignment.
Reviewed By: jvillard
Differential Revision: D12940635
fbshipit-source-id: 89c6e530d
Summary:
Whenever `vec.reserve(n)` is called, remember that the vector is
"reserved". When doing `vec.push_back(x)` on a reserved vector, assume
enough size has been reserved in advance and do not invalidate the
underlying array.
This gets rid of false positives.
Reviewed By: mbouaziz
Differential Revision: D12939837
fbshipit-source-id: ce6354fc5
Summary:
Instead of keeping at most one invalidation fact for each address, keep
a set of them and call them "attributes". Keeping a set of invalidation
facts is redundant since we always only want the smallest one, but
makes the implementation simpler, especially once we add more kinds of
attributes (used for modelling, see next diffs).
Reviewed By: mbouaziz
Differential Revision: D12939839
fbshipit-source-id: 4a54c2132
Summary:
Copied on the ownership checker logic: return the initial value of the
domain as return. This can probably be improved.
Reviewed By: mbouaziz
Differential Revision: D12888102
fbshipit-source-id: 9e2dac7fc
Summary:
When initialising a variable via semi-exotic means, the frontend loses
the information that the variable was initialised. For instance, it
translates:
```
struct Foo { int i; };
...
Foo s = {42};
```
as:
```
s.i := 42
```
This can be confusing for backends that need to know that `s` actually
got initialised, eg pulse.
The solution implemented here is to insert of dummy call to
`__variable_initiazition`:
```
__variable_initialization(&s);
s.i := 42;
```
Then checkers can recognise that this builtin function does what its
name says.
Reviewed By: mbouaziz
Differential Revision: D12887122
fbshipit-source-id: 6e7214438
Summary:
Now that arrays are dealt with separately (see previous diff), we can
turn the join back into an over-approximation as far as invalid
locations are concerned.
Reviewed By: skcho
Differential Revision: D12881989
fbshipit-source-id: fd85e49c0
Summary:
Arrays are the main source of false positives that prevent us from
having a better (less under-approximate) join in general. The next diff
improves join and I split this off to make it easier to review.
Reviewed By: mbouaziz
Differential Revision: D12881986
fbshipit-source-id: 5f52dea27
Summary:
This prevents the join from wrongly assuming that we haven't seen a
variable on one side of the join.
Reviewed By: skcho
Differential Revision: D12881987
fbshipit-source-id: 42a776adb
Summary:
Smaller numbers are easier to read and abstract addresses should never
be shared across functions anyway.
Reviewed By: da319
Differential Revision: D12881988
fbshipit-source-id: f9bcfa343
Summary:
The upcoming ocamlformat has the ability to parse and format
docstrings. This requires that the docstrings conform to the ocamldoc
spec a bit more strongly. If a docstring does not parse, it is left
alone, but if it is morally ill-formed but parses by chance, it can be
reformatted incorrectly. This patch fixes the existing instances of
this problem.
Reviewed By: mbouaziz
Differential Revision: D12911937
fbshipit-source-id: 1c2eb590b
Summary: For `operator=(lhs, rhs)` we want to model it as an assignment if rhs is materialized temporary created in the constructor.
Reviewed By: jvillard
Differential Revision: D10462510
fbshipit-source-id: 998341e69
Summary: Do not create a new location for placement new argument if it already exists.
Reviewed By: jvillard
Differential Revision: D12839942
fbshipit-source-id: 758b67a82
Summary:
Get rid of `USE_AFTER_LIFETIME`. This could be useful to deploy pulse
alongside the ownership checker too.
Reviewed By: da319
Differential Revision: D12857477
fbshipit-source-id: 8e2a2a37c
Summary: Make the whole type private, introduce constructors for each variant, and deal with the consequences.
Reviewed By: da319
Differential Revision: D12825810
fbshipit-source-id: a01922812
Summary:
Keep `USE_AFTER_LIFETIME` for unclassified errors (for now it contains
vector invalidation too because I can't think of a good name for
them, and maybe it makes sense to wait until we have more types of them
to decide on a name).
Reviewed By: da319
Differential Revision: D12825060
fbshipit-source-id: bd75ef698
Summary:
Getting this right will be long and complex so for now the easiest is to
underreport and only consider as invalid the addresses we know to be invalid on
both sides of a join. In fact the condition for an address to be invalid after
a join is more complex than this: it is invalid only if *all* the addresses in
its equivalence class as discovered by the join are invalid.
Reviewed By: skcho
Differential Revision: D12823925
fbshipit-source-id: 2ca109356