Summary:
Now that HIL doesn't help us anymore we need to reconstruct its mapping
"SIL logical var -> program access path". We already have everything we
need in pulse: it suffices to walk the current memory graph starting
from program variables until we find the value of the temporary we are
interested in.
This diff also builds some type machinery to make sure all accesses are
explained.
Reviewed By: mbouaziz
Differential Revision: D15824959
fbshipit-source-id: 722c81b39
Summary:
It turns out HIL gets in the way of a precise heap analysis. For
instance, instead of:
```
n$0 = *&x.f
_ = delete(&x)
*&y = n$0
```
HIL tries hard to forget about intermediate variables and shows instead
```
_ = delete(&x)
*&y = *&x.f
```
Oops, that's a use-after-delete, whereas the original code was safe.
While it's easy to write SIL programs that are completely unsound for
HIL, they are not generated very often from the frontends. In fact, the
problem became apparent only when making the clang frontend translate
C++ temporaries destructors, which produces the situation above
routinely.
This diff makes the minimal amount of change to make Pulse build and
produce equivalent results (minus HIL bugs) starting from SIL instead of
HIL. The reporting sucks for now because we need to translate SIL
temporaries back into program access paths. This is done in the next
diff.
Reviewed By: mbouaziz
Differential Revision: D15824961
fbshipit-source-id: 8e4e2a3ed
Summary:
Just moving code around.
This is needed later to make some types in `PulseTrace` depend on
a new that I'll have to define in `PulseDomain`.
Also, this gives better names all around I think
Reviewed By: mbouaziz
Differential Revision: D15881281
fbshipit-source-id: e86c1472e
Summary:
Just moving code around.
This is needed later to make some types in `PulseInvalidation` depend on
a new type that I'll have to define in `PulseDomain`.
Reviewed By: mbouaziz
Differential Revision: D15824962
fbshipit-source-id: 86cba2bfb
Summary:
Make it possible to re-use the graph visitor to compute all sorts of
things with a flexible API where you can pass a function that folds over
all addresses reachable from certain stack variables (specified with a
filter) and gets passed the access path that leads to each address.
This is used in later commits.
Reviewed By: mbouaziz
Differential Revision: D15824960
fbshipit-source-id: c424a71cb
Summary: Preanalysis is performed at the frontend now. Hence, we don't need to repeatedly check/set when/if it is performed.
Reviewed By: mbouaziz
Differential Revision: D15863175
fbshipit-source-id: f9c6b7ae1
Summary:
One "interesting" feature of the approach of merging the captured targets in Java, is that we union their type environments, as opposed to store partial tenvs together with each source file, which is the case for Clang.
This means
- the final global type environment is potentially huge because it contains all the types in all targets.
- all analysis workers start by loading that tenv in memory, meaning we consume `|size of tenv| x #cpus` memory, which can tip the balance towards OOMs
This diff attempts to economise on global tenv size. This is done by increasing sharing which is then preserved by marshalling. It's done in a brute force way, with hashtables for each struct component, and is not fully effective due to the recursion amongst types and types names, as well types appearing inside other constructs such as procnames.
This is done when calling `Tenv.store` so that
- the computation can be parallelised somewhat (capture is parallel, merging is not)
- buck caching will benefit from smaller tenvs.
This saves about 24% of total memory devoted to the type environment.
Reviewed By: mbouaziz
Differential Revision: D15840054
fbshipit-source-id: 6f03be1a4
Summary:
- Add allocation costs to `costs-report.json` and enable diffing over allocation costs.
- Also, let's be more consistent and modular in naming our cost issues.
- introduce a generic issue type `X_TIME_COMPLEXITY_INCREASE` where `X` can be one of the cost kinds. If the function is on the cold start, issue can have the `COLD_START` suffix. Similarly for infinite/zero/expensive calls.
- Change `PERFORMANCE_VARIATION` -> `EXECUTION_TIME_COMPLEXITY_INCREASE`
- Add new issue type for `ALLOCATION_COMPLEXITY_INCREASE_COLD_START` which will be enabled by default
- Refactor cost issues to be more modular and succinct. This also makes addition of a new cost kind very easy by adding the kind into the `enabled_cost_kinds` list in `CostKind.ml`
Reviewed By: mbouaziz
Differential Revision: D15822681
fbshipit-source-id: cf89ece59
Summary:
This one isn't caught because we don't destruct temporaries that are
bound to a const reference. According to the C++ standard these should
get destroyed when the const reference gets destroyed but instead we
just don't destroy them for now.
Reviewed By: mbouaziz
Differential Revision: D15760209
fbshipit-source-id: 32c935ec0
Summary:
In a next diff temporaries will get destructed at the end of their
lifetimes and that naive model would be causing false positives.
The flipside is that we lose all reports on closures for now, will need
to model them separately later.
Reviewed By: mbouaziz
Differential Revision: D15695943
fbshipit-source-id: c2c482c02
Summary:
Needed for next diff: we'll need to do 2 passes on the AST to collect
the temporaries to destroy at the end of an `ExprWithCleanups`, but the
SIL names of these temporaries are generated freshly on the fly so they
would get different names if we do it naively.
This adds a hashmap to the translation context so the temporary
corresponding to a given `MaterializeTemporyExpr` is only generated once
and then reused.
Reviewed By: mbouaziz
Differential Revision: D15674212
fbshipit-source-id: 0e16062d9
Summary:
This started as an attempt to understand how to modify the frontend to
inject destructors for C++ temporaries (see next diffs).
This diff rewrites the existing logic for computing the list of
variables that should be destroyed at the end of each statement, either
because it's the end of their syntactic scope or because control flow
branches outside of their syntactic scope.
The frontend translates a function from the last instructions to the
first, but scope computation needs to be done in the other direction, so
it's done in a separate pass *before* the main translation happens. That
first pass creates a map from statements in the AST to the list of
variables that should be destroyed at the end of these statements. This
is still the case now.
Before, that map would be computed in a bit of a weird way: scopes are
naturally a stack but instead of that the structure maintained was a
flat list + a counter to know where the current scope ended in that
list.
In this diff, redo the computation maintaining a stack of scopes
instead, which is a bit cleaner. Also treat more instructions as
introducing a new scope, eg if, for, ...
Reviewed By: mbouaziz
Differential Revision: D15674208
fbshipit-source-id: c92429e82
Summary:
Somewhat trivial: add a string to "Destruction" nodes to indicate why
they were created. Rename the main `instruction_aux` function into
`instruction_translate` (see next diff for why).
Reviewed By: mbouaziz
Differential Revision: D15674211
fbshipit-source-id: 8a7eda72c
Summary:
I rewrote the test so it doesn't need any C++ headers so that:
- it's easier to see what's going on
- it's easier to debug: the whole AST is now somewhat readable vs before
the headers made it impossibly long
Reviewed By: ezgicicek
Differential Revision: D15674213
fbshipit-source-id: d98941983
Summary: This allows to match `foo<int_&>` and many other horrible names.
Reviewed By: mbouaziz
Differential Revision: D15825403
fbshipit-source-id: c892033aa
Summary:
I realized that there was a discrepancy in the # of instructions between whether we run a single analysis or multiple analyses at the same time. It turns out that in biabduction, bufferoverrun and other HIL analyses we did Preanalysis step (which adds scope instructions and invokes liveness etc.) but not in others. This discrepancy results in inconsistent analysis results (e.g. in the new inefficient-keyset-iterator) that rely on instructions. We should be consistent. Hence, we now invoke Preanalysis in the frontend and remove all other uses in the rest of the checkers.
Consequently, I had to update the inefficient-keyset-checker to take the CFG resulting from Preanalysis with extra scoping instructions.
Reviewed By: mbouaziz, ngorogiannis, jvillard
Differential Revision: D15803492
fbshipit-source-id: 4e21eb610
Summary:
This is a simple checker that identifies inefficient uses of `keySet` iterator where (not only the key but also) the value is accessed via `get(key)`. It is more efficient to use `entrySet` iterator which already returns both key-value pairs. This optimization would get rid of many extra lookups which can be expensive.
We simply traverse the CFG starting from the loop head upwards and pick up the map that is iterated over. Then, we check in the loop nodes if there is a call to `get(...)` over this map. If, so we report.
Reviewed By: ngorogiannis
Differential Revision: D15737779
fbshipit-source-id: 702465b4e
Summary:
Move genrule capture integration logic from shell to OCaml.
Also, stop relying on side-effects of buck compilation for constructing the infer-deps.txt file used for merging. Now this is obtained by passing `--show-output` to buck, which spits out the `buck-out` output paths to the targets we asked to build.
Reviewed By: ezgicicek
Differential Revision: D15715608
fbshipit-source-id: 8fa896ba6
Summary:
The synthetic methods from `topl.Property` are now nonempty: they
simulate a nondeterministic automaton.
Reviewed By: jvillard
Differential Revision: D15668471
fbshipit-source-id: 050408283
Summary:
Instrument SIL according to TOPL properties. Roughly, the
instrumentation is a set of calls into procedures that simulate a
nondeterministic automaton. For now, those procedures are NOP dummies.
Reviewed By: jvillard
Differential Revision: D15063942
fbshipit-source-id: d22c2f6fa
Summary:
When multiple buck java tests use the same `buck-out` they sometimes fail. This isn't surprising, as they presumably clobber each other's output when running on the same files.
Since there is no reason to have this global, shared buck repo, create one for each test, inside the test directory. Also, clean up the Makefiles a bit -- they provide bogus compile targets, for example, and have mostly wrong source dependencies.
That done, remove the `testlock` crutch which enforces mutual exclusion between tests, from the buck/java tests.
I do not understand why the buck clang tests can share the global repo without failure, but there you go.
Reviewed By: jvillard
Differential Revision: D15579133
fbshipit-source-id: 7eff79173
Summary: Not sure how that happens but it does. Instead of crashing, log the error and continue.
Reviewed By: martintrojer
Differential Revision: D15660008
fbshipit-source-id: c87e724d4
Summary: The previous commit broke the `--foo arg` case because it matched `--foo` in the case looking for `--foo=`.
Reviewed By: mbouaziz
Differential Revision: D15670472
fbshipit-source-id: ab81c7357
Summary: There's currently no way to skip these when they are passed to clang.
Reviewed By: martintrojer
Differential Revision: D15669132
fbshipit-source-id: be97d2638
Summary:
It is unsafe to call protocol methods defined optional. Before calling them we should check it
the implementation exists by calling
`if ([object respondsToSelector:selector(...)]) ...`
Without the above check we get run time crashes.
Reviewed By: jvillard
Differential Revision: D15554951
fbshipit-source-id: f0560971b
Summary: In its new form it actually tests that infer takes the correct branch.
Reviewed By: mbouaziz
Differential Revision: D15494297
fbshipit-source-id: 7b9bb8f75
Summary:
- take advantage more structured attributes in the exported AST
- circumvent new format of `if` and `switch`
- a few new features/nodes but nothing major there
update-submodule: facebook-clang-plugins
Reviewed By: mbouaziz, martintrojer
Differential Revision: D15453572
fbshipit-source-id: c0c24345f
Summary:
Somehow clang now chooses slightly different arguments to pass to `ld`
in the invocation that `ndk-build` makes to link:
```
--- clang7 2019-05-28 07:47:19.214949009 -0700
+++ clang8 2019-05-28 07:46:55.095924374 -0700
@@ -1,6 +1,15 @@
"/opt/android_ndk/r15c/toolchains/aarch64-linux-android-4.9/prebuilt/linux-x86_64/lib/gcc/aarch64-linux-android/4.9.x/../../../../aarch64-linux-android/bin/ld"
"--sysroot=/opt/android_ndk/r15c/platforms/android-21/arch-arm64"
+"-EL"
"--fix-cortex-a53-843419"
+"-z"
+"now"
+"-z"
+"relro"
+"-z"
+"max-page-size=4096"
+"--hash-style=gnu"
+"--hash-style=both"
"--no-add-needed"
"--enable-new-dtags"
"--eh-frame-hdr"
@@ -32,7 +41,7 @@
"--fatal-warnings"
"-lc"
"-lm"
-"-lstdc++"
+"-lc++"
"-lm"
"-lgcc"
"-ldl"
```
In particular:
- `lc++` results in `libc++.so` not found from the toolchain
- the forced relocation `-z relro` fails with "/..//bin/ld: ./obj/local/arm64-v8a/objs/hello/__/hello.o: Relocations in generic ELF (EM: 183)" and other weirder errors
Somehow pretending the C++ compiler is `clang` instead of `clang++` stops the insanity.
Also add an Application.mk file to specify some sane defaults.
Also add `V=1` to the `ndk-build` invocation in our tests so that when it fails we have a bit more to work with.
Reviewed By: mbouaziz, martintrojer
Differential Revision: D15518447
fbshipit-source-id: 40203814b
Summary:
- Rename `invariantModels` to `purityModels`
- Track which arguments are modified in purity models. Before we were invalidating all arguments of impure modeled functions. Instead, now we only invalidate modified args given in the model. This should ideally result in more precision in the analysis.
- Add some more purity models for :`cast`, `new`, `new_array` and `Math.random`
Reviewed By: mbouaziz
Differential Revision: D15535332
fbshipit-source-id: 5395800d9
Summary:
That test wasn't hooked up to `make test` and so regressed at some
unknown time in the past. Just recording the new state of things for
now.
Reviewed By: ngorogiannis
Differential Revision: D15495234
fbshipit-source-id: 14fb112de
Summary:
Infer was complaining about a parameter not null checked, which was failing the
model compilation since no errors are allowed on models.
Reviewed By: ezgicicek
Differential Revision: D15453573
fbshipit-source-id: 3bd0df715
Summary:
`infer_events` table is a key-value storage that also have list of fields common for entire infer run.
Infer should be agnostic of many of such fields (e.g. diff number).
Hence we will pass such extra fields through CI.
Note that only "normals" (strings in scuba terminology) are currently supported.
Reason being: most of things that are technically ints (like IDs) should actually be normals (because average etc does not make sense for them; and group by, in contrast, does).
Reviewed By: jvillard
Differential Revision: D15376636
fbshipit-source-id: 729eaabfc
Summary: There can be A LOT of procedures -- currently we log two lines (started/done) for each one, when doing call graph scheduling. This leads to ridiculously long log files. Switch to only log these messages in the log file and only if we are verbose logging.
Reviewed By: jvillard
Differential Revision: D15413330
fbshipit-source-id: 6e26693e8
Summary:
Thanks to the newly added `StarField`, path length is better controlled before ondemand is used.
Hence there is no need to (unsoundly) canonicalize paths then anymore.
Reviewed By: ezgicicek
Differential Revision: D15409716
fbshipit-source-id: 9ea7b4717
Summary:
This messes with the deduplication heuristic when templated function
names show up in the error messages, since the heuristic demands that
the error messages are the same.
Reviewed By: mbouaziz
Differential Revision: D15374333
fbshipit-source-id: 70232d254
Summary:
Improve the error messages, change is more or less documented in the
code.
Reviewed By: mbouaziz
Differential Revision: D15374334
fbshipit-source-id: f1dd54180
Summary:
Some edge case involving casting field pointers to the structure type itself generated arbitrarily long paths when used in a loop.
Without changing the widening, this diff avoids repetitions of fields in paths by abstracting them with a star.
E.g. `x.a.b.c.b` will become `x.a.b.c*.b`, and so will `x.a.b.c.a.b`, `x.a.b.c.c.b`, or `x.a.b.c.b.b`.
Reviewed By: ngorogiannis
Differential Revision: D15352143
fbshipit-source-id: 5ea426c5e
Summary:
I was wondering what were the empty sessions and why inferbo was running twice.
Answer: the empty sessions were 'compute pre' and the second run of inferbo was the narrowing phase.
Reviewed By: ngorogiannis
Differential Revision: D15378138
fbshipit-source-id: 507a3df42
Summary:
This was hardcoded to `true` and its purpose is unclear to me. I kill
what confuses me.
Reviewed By: jeremydubreil
Differential Revision: D15294783
fbshipit-source-id: 3c1c469ee
Summary:
- Makes sure that `start_session` and `finish_session` are well parenthesized
- Avoids a try finally when debug is disabled
Reviewed By: ngorogiannis
Differential Revision: D15371841
fbshipit-source-id: 340203edb
Summary:
Before: the trace would explain how a value was invalidated and
accessed, but not how the value that was invalidated had been
constructed.
Now: `PulseTrace.t` records breadcrumbs of how the value was constructed
in addition to the interproc "action" trace leading to the invalidation
or access action.
Concretely:
```
void bad(X &x) {
X *y = x;
X *z = x;
delete y;
access(z);
}
```
will produce the trace:
Invalidation part:
y = x
delete y
Access part:
z = x
access(z)
access to z->f inside of access(z)
Before this diff the "Access part" would be missing the "z = x" part of
the trace, so it might be confusing why `z` has anything to do with `y`.
However, such "breadcrumbs" are not recorded in the inter-procedural
part, only the sequence of calls is. This is a trade-off for simplicity,
maybe it's enough for developers maybe it isn't, we'll find out later.
Reviewed By: jberdine
Differential Revision: D15354438
fbshipit-source-id: 8d0aed717
Summary:
In preparation for the next diff that re-uses `PulseTrace.t` for a type
that combines breadcrumbs + action.
No change intended.
Reviewed By: mbouaziz, jberdine
Differential Revision: D15354437
fbshipit-source-id: cbb8757b4
Summary:
Before: no links to procedure summary and nodes in header file debug html
Now: some or all of them if you are lucky enough
Reviewed By: jvillard
Differential Revision: D15279379
fbshipit-source-id: a145f9e66
Summary:
Before: they are written only when the file is fully analyzed.
Now: a first version is written as soon as the file gets analyzed so that we get links to nodes, the final version overwrites it
Reviewed By: jvillard
Differential Revision: D15279351
fbshipit-source-id: a3120aa31