Summary: Add cost model for most common `NSString` functions in cost analysis
Reviewed By: skcho
Differential Revision: D22433005
fbshipit-source-id: 2f57bbda9
Summary: This diff prints where the cost becomes top by calling `html_debug_new_node_session`. This will print them in the start node of the procedure in html. There are already printing functions in `get_instr_node_cost_record`.
Reviewed By: ezgicicek
Differential Revision: D22547578
fbshipit-source-id: 257e957c0
Summary: `addAll` adds elements one by one and hence takes linear time. We didn't have a model for this and considered it O(1).
Reviewed By: skcho
Differential Revision: D22375157
fbshipit-source-id: 65b82bfae
Summary:
Following from previous diff.
**Idea** - 80% of functions with Top cost are caused by calling top-costed callees, i.e. callee's Top cost is simply propagated to its transitive callers, so the aim is to investigate such root callees along with the number of their transitive callers.
Consider the following code
```
void bar1() {
// top cost function
}
void bar2() {
// another top cost function
}
void baz(){
// baz have top cost because of bar
bar1();
}
void foo() {
// goo have top cost because of baz
baz();
bar2()
}
```
Clearly, the root cause of the foo being top cost is `bar1` and `bar2`.
1. When we are analyzing `baz`, we know that it calls `bar1`, which is top cost, so we record that `baz = { T, bar1 } `.
2. Now, say we are analyzing foo.
When we analyze the call to `baz`, we found out that the top cost of `baz` is caused by `bar1`, so we record `foo = { T, bar1 }`.
When we analyze the call to `bar2`, we know that `bar2` is top cost, but since at this stage we only want to deal with the first top cost function we met, so we ignore it.
Since we are keeping track of top cost function by examining the `Call` instruction, we would expect to see two log of `bar1` in the result. The test plan confirms it.
Reviewed By: ezgicicek
Differential Revision: D22231457
fbshipit-source-id: 45d48e4a7
Summary:
Extend BasicCost to BasicCostWithReason which contains a record of the form
```{cost: BasicCost.t; proc_name_list: Procname.t list}```
This is done so that we can keep track of top cost function.
So the idea is that 80% of functions with Top cost are caused by calling top-costed callees, i.e. callee's Top cost is simply propagated to its transitive callers, so the aim is to investigate such root callees along with the number of their transitive callers.
Therefore, we create an extension that match `cost` to the root cause function.
This diff only handles the extension. Details about how we update the root cause function is in the next diff.
Reviewed By: skcho
Differential Revision: D22158717
fbshipit-source-id: 6498d904f
Summary: Log unmodeled function in cost analysis and send result to scuba.
Reviewed By: ezgicicek
Differential Revision: D22158510
fbshipit-source-id: c6eade67e
Summary: Cost analysis has an additional mode that checks whether the function occurs on a UI (main) thread. If so, it warns the user. This check is only supported for Java and C++ but not for ObjC, so the diff suppresses this check for ObjC, and set ```is_on_ui_thread``` to ```false``` by default.
Reviewed By: ezgicicek
Differential Revision: D21952470
fbshipit-source-id: 838dd5639
Summary:
See previous diff: issues are always reported with the same severity so
recognise that and just use their default severity in "modern" checkers.
Reviewed By: ngorogiannis
Differential Revision: D21904591
fbshipit-source-id: fb5387e35
Summary:
We stopped relying on an arbitrary threshold. Hence,
- we don't need all the machinery for reporting at a specific node for a threshold
- we can remove
- the issue type `EXPENSIVE_EXECUTION_TIME`
- the config option `--use-cost-threshold`.
Reviewed By: skcho
Differential Revision: D21859815
fbshipit-source-id: b73a2372d
Summary: We do not use an arbitrary threshold to test cost results anymore but instead rely on `cost-issues` which do not have any trace attached. This diff adds traces to `costs-report.json` so that we can test cost issues with traces.
Reviewed By: skcho
Differential Revision: D21858846
fbshipit-source-id: e73321a92
Summary:
The model returns an array the length of which is the same to that of enum entries.
It takes the length of enum entries from the summary of `Enum.values` because it is not written in `tenv`. In order to do that, the model semantics should be able to request the summary of the function with `get_summary`, so I extended `model_env` to include the functionality.
Reviewed By: ezgicicek
Differential Revision: D21843319
fbshipit-source-id: d6f10eb91
Summary:
In inferbo's domain, `Loc.t` and `Symb.SymbolPath.partial` are defined with the same *field abstraction*. The depth of appended fields were limited in both of them exactly in the same way, e.g. `x.*.field`. Problem is that the implementation related
to the field abstraction are duplicated in their code, which had been synchronized manually. This diff avoids the duplication by adding a `BufferOverrunField.t`.
Reviewed By: ezgicicek
Differential Revision: D21743728
fbshipit-source-id: 4b01d027c
Summary:
Add an extra argument everywhere we report about the identity of the
checker doing the reporting. This isn't type safe in any way, i.e. a
checker can masquerade as another. But, hopefully it's enough to ensure
checker writers (and diff reviewers) have a chance to reflect on what
issue type they are reporting.
Reviewed By: ngorogiannis
Differential Revision: D21638823
fbshipit-source-id: b4a4b0c0a
Summary:
We stopped relying on an external perf data file to determine which functions are on the cold start. Let's remove this issue now.
NB: Keeping the `--perf-profiler-data-file` as deprecated to prevent issues on the CI and prod.
Reviewed By: skcho
Differential Revision: D21594150
fbshipit-source-id: faa58782d
Summary:
Needed to move a bunch of files around to make this happen. Notably,
moving "preanal.ml" outside of checkers/ into backend/ since it needs to
modify the proc desc in the summary. Also hoisting goes to cost/.
Reviewed By: skcho
Differential Revision: D21407069
fbshipit-source-id: ebb9b78ec
Summary:
Changing inferbo required changing all the analyses that depend on it
too. This introduces a new feature of the the new framework: the ability
for the checkers to read other analyses' payloads in a more functional
way.
Reviewed By: ezgicicek, skcho
Differential Revision: D21401819
fbshipit-source-id: f9b99e344
Summary:
It suppresses cost reports of access methods, anonymous class methods, and auto-generated methods.
For those methods, this diff blocks reporting both expensive
issue (`EXPENSIVE_COLD_START/_UI_THREAD`) and complexity increase issue (`COMPLEXITY_INCREASE`).
Reviewed By: jvillard
Differential Revision: D21303068
fbshipit-source-id: 1c9533956
Summary:
Looks like this one was already causing circular dependency issues!
Saves threading the callback through *many* functions.
Reviewed By: ngorogiannis
Differential Revision: D21261636
fbshipit-source-id: 2b23aa07d
Summary:
This is a step in disentangling the various analyses: that file used to
make every checker on biabduction because of a few of its functions that
use biabduction datatypes.
Split reporting.ml into:
- Reporting.ml: the functions all checkers need to report errors. This
is put in absint/ with the other files that are needed by all
checkers.
- SummaryReporting.ml: functions that need to depend on Summary.ml
(useful for later). This is put in backend/ where Summary.ml lives.
- BiabductionReporting.ml: for the biabduction analysis
The rest of the changes are renames to use the appropriate module
amongst the above.
Reviewed By: ngorogiannis
Differential Revision: D21257468
fbshipit-source-id: fa28cefbc
Summary: This diff suppresses cost issues on lambda and auto-generated procedures, since they were too noisy.
Reviewed By: ezgicicek
Differential Revision: D21153619
fbshipit-source-id: 65ad6dcc3
Summary:
This information can be useful for tooling responsible for further
processing (e.g. metric calculation and logging)
Reviewed By: artempyanykh
Differential Revision: D20914583
fbshipit-source-id: 61804d88f
Summary:
Currenlty the cost issue is printed at the first node of a function, which is usually the first
statment of the function. This may give a wrong impression that the cost of the statement is
changed.
This diff re-locate where to print issues with heuristics. Going backward from the first node
lines, it looks up a line satisfying,
1. A line should start with <fname> or should include " <fname>".
2. The <fname> found in 1 should be followed by a space, '<', '(', or end of line.
Reviewed By: jvillard
Differential Revision: D20766876
fbshipit-source-id: b4fee3180
Summary:
Fix all the docstrings that `odoc` or `ocamlformat` is not happy about.
Delete all `[@@ocamlformat "parse-docstring = false"]` pragmas as a
result.
Reviewed By: jberdine, ngorogiannis
Differential Revision: D20798913
fbshipit-source-id: 728d9e45c
Summary:
To find a method in non-abstract sub-classes, this diff applies the
same heuristics of inferbo.
* If the class is an interface: Find its unique sub-class and apply the heuristics recursively.
* If the class is an abstract class: Find/use its own summary if possible. If not found, find
one (arbitrary but deterministic) summary from its sub-classes.
* Otherwise: Find its own summary.
Reviewed By: ezgicicek
Differential Revision: D20647101
fbshipit-source-id: 2f8f3ff81
Summary: Add a flag `is-inclusive-cost` (`true` by default) which computes inclusive cost for each function. Setting the flag to `false` computes exclusive cost of the function where the cost of the callees are assumed to be `0`.
Reviewed By: skcho
Differential Revision: D20558275
fbshipit-source-id: 6b5798916
Summary: Type is not enough to say a function call of `Provider.get` is expensive or not.
Reviewed By: jvillard
Differential Revision: D20366206
fbshipit-source-id: 83d3e8741
Summary:
This diff uses a type parameter of `Provider.get` to decide whether assigning expensive cost to the
function call or not. For example, if the type is small one like `Provider<Integer>`, it be
evaluated to have a unit cost, otherwise a linear cost.
To get the return type of `Provider.get`, I added a simple analyzer that collects "casted" types
backwards. In Sil, while the function call statement loses the return type, e.g,
```
n$5=_fun_Object Provider.get()(n$3:javax.inject.Provider*);
```
the `n$5`'s value is usually casted to a specific type at some point later.
```
*&$irvar0:java.lang.Object*=n$5
n$8=*&$irvar0:java.lang.Object*
n$9=_fun___cast(n$8:java.lang.Object*,sizeof(t=java.lang.Integer;sub_t=( sub )(cast)):void)
```
So, the analyzer starts from the cast statements backward, collecting the types to cast for each
variables.
Reviewed By: ezgicicek
Differential Revision: D20345268
fbshipit-source-id: 704b42ec1
Summary:
This diff renames `ZERO_XXX` issues to more appropriately named and descriptive
`XXX_UNREACHABLE_AT_EXIT` and replaces bottom with
unreachable in cost kinds and issues.
Reviewed By: skcho
Differential Revision: D20140301
fbshipit-source-id: eb6076b30
Summary:
The issue type `ZERO_EXECUTION_TIME` actually corresponds to bottom state but has been mistakenly used to mean
- unreachable nodes (program never reaching exit state)
- having zero cost (e.g. for allocations).
Note that, for execution costs, the latter doesn't make sense since we always incur a unit cost for the start node. Hence, a function with empty body will have unit cost. For allocations or IO however, we only incur costs for specific primitives, so a function with no allocations/IO could have a zero cost. However, there is no point reporting functions with zero cost as a specific issue type. Instead, what we want to track is the former, i.e. functions whose cost becomes 0 due to program never reaching exit state.
This diff aims to split these cases into two by only reporting on the latter and adds traces to bottom/unreachable cost by creating a special category in polynomials.
Next diff will rename `ZERO_XXX` to `XXX_UNREACHABLE_AT_EXIT`.
Reviewed By: skcho
Differential Revision: D20005774
fbshipit-source-id: 46b9abd5a
Summary:
> We don't report when the cost is Top as it corresponds to subsequent 'don't know's. Instead, we
> report Top cost only at the top level per function
The previous code just ignored top costed nodes, so it was able to report a non-top cost that was
from another node. For example,
```
void foo() {
linear-cost();
top-cost();
}
```
It reported inconsistent reports: `EXPENSIVE_EXECUTION_TIME` with a linear cost and
`INFINITE_EXECUTION_TIME` at the same time.
This diff fixes it not to report `EXPENSIVE_EXECUTION_TIME` when there is a node with the top cost.
Reviewed By: ezgicicek
Differential Revision: D20139408
fbshipit-source-id: 9fedd4aec
Summary:
In the previous report, it reported the first cost of node that exceeds a threshold. However, this
may hide a bigger cost of node that appears later. This diff changes this to report the biggest
cost of node among the costs exceeding the threshold.
Reviewed By: ezgicicek
Differential Revision: D20116162
fbshipit-source-id: 06199fb46
Summary:
This ignores the error memory status (e.g. when condition expression is evaluated to bottom), in
order to keep analyze following code.
```
if ( e ) // e is evaluated to bottom due to a problem of Inferbo {
... // code here was not analyzed before
}
```
Reviewed By: ezgicicek
Differential Revision: D20067434
fbshipit-source-id: a1713722c
Summary:
Use a record of package, class name to store (qualified) Java class names. This saves the round trip of concatenating then splitting again, etc, as well as saves some memory in the type environment as now the package paths can be shared across classes of the same package (about 10% in tests).
Also remove some unfortunate APIs.
Reviewed By: jvillard
Differential Revision: D19969325
fbshipit-source-id: f7b7f5a55
Summary: Rather than recomputing the `proc_name`, let's pass it around.
Reviewed By: skcho
Differential Revision: D19951461
fbshipit-source-id: 90b57dcc7
Summary:
In Inferbo, the bottom memory is introduced when a node is unreachable by pruning, i.e.
`[[e]] <= [0,0]` on `prune(e)`. This diff distinguishes whether `[[e]]` is `[0,0]` (unreachable)
or bottom (it could not evaluate `e` by some unknown reasons).
Reviewed By: ezgicicek
Differential Revision: D19902046
fbshipit-source-id: 7706017d6
Summary:
Prevent returning a negative cost bound when calling `substring(begin_index, end_index)` when either is possible
- `begin_index < 0`
- `begin_index > end_index`
Instead, return unit cost since such cases either throw `IndexOutOfBoundsException ` at runtime or correspond to having two symbolic bounds that cannot be semantically compared.
Reviewed By: dulmarod
Differential Revision: D19619410
fbshipit-source-id: cf5e8cb7b
Summary: D19496263 had refactored cost to its own subdirectory. In this process, it introduced a perf degradation by bundling of collection and computation of constraints which relies on imperative union find. This diff fixes that by separating them.
Reviewed By: ngorogiannis
Differential Revision: D19598610
fbshipit-source-id: 0e466522d
Summary: `cost.ml` is huge. Let's split it to its logical parts (basically creating new files for the modules that were already in `cost.ml`) and move all cost related files into `\cost` directory. While we are at it, let's add `mli` files too.
Reviewed By: jvillard
Differential Revision: D19496263
fbshipit-source-id: 45096db4c