Summary:
Reduces the size of the `tenv` by sharing values as most as possible, in an untyped - but supposedly safe - way, by using black magic on objects.
Can be reused for other things later.
Reviewed By: ngorogiannis
Differential Revision: D15855870
fbshipit-source-id: 169a4b86b
Summary:
Using `Marshal.to_string` to create SQLite values used in comparisons is brittle as there is no guarantee that it will return the same value for structurally equal values.
When adding sharing, this will definitely break.
From the SQLite queries I found, only `SourceFile` and `Procname` are used in comparisons.
I haven't tested performance.
It shouldn't change anything for `SourceFile` as there is no possible sharing.
It shouldn't change much for `Procname` as they are pretty small anyway.
Reviewed By: ngorogiannis
Differential Revision: D15923122
fbshipit-source-id: ce4af1fe3
Summary: Inject destructor calls to destroy a temporary when its lifetime ends.
Reviewed By: mbouaziz
Differential Revision: D15674209
fbshipit-source-id: 0f783a906
Summary:
Now that HIL doesn't help us anymore we need to reconstruct its mapping
"SIL logical var -> program access path". We already have everything we
need in pulse: it suffices to walk the current memory graph starting
from program variables until we find the value of the temporary we are
interested in.
This diff also builds some type machinery to make sure all accesses are
explained.
Reviewed By: mbouaziz
Differential Revision: D15824959
fbshipit-source-id: 722c81b39
Summary:
It turns out HIL gets in the way of a precise heap analysis. For
instance, instead of:
```
n$0 = *&x.f
_ = delete(&x)
*&y = n$0
```
HIL tries hard to forget about intermediate variables and shows instead
```
_ = delete(&x)
*&y = *&x.f
```
Oops, that's a use-after-delete, whereas the original code was safe.
While it's easy to write SIL programs that are completely unsound for
HIL, they are not generated very often from the frontends. In fact, the
problem became apparent only when making the clang frontend translate
C++ temporaries destructors, which produces the situation above
routinely.
This diff makes the minimal amount of change to make Pulse build and
produce equivalent results (minus HIL bugs) starting from SIL instead of
HIL. The reporting sucks for now because we need to translate SIL
temporaries back into program access paths. This is done in the next
diff.
Reviewed By: mbouaziz
Differential Revision: D15824961
fbshipit-source-id: 8e4e2a3ed
Summary: Preanalysis is performed at the frontend now. Hence, we don't need to repeatedly check/set when/if it is performed.
Reviewed By: mbouaziz
Differential Revision: D15863175
fbshipit-source-id: f9c6b7ae1
Summary:
One "interesting" feature of the approach of merging the captured targets in Java, is that we union their type environments, as opposed to store partial tenvs together with each source file, which is the case for Clang.
This means
- the final global type environment is potentially huge because it contains all the types in all targets.
- all analysis workers start by loading that tenv in memory, meaning we consume `|size of tenv| x #cpus` memory, which can tip the balance towards OOMs
This diff attempts to economise on global tenv size. This is done by increasing sharing which is then preserved by marshalling. It's done in a brute force way, with hashtables for each struct component, and is not fully effective due to the recursion amongst types and types names, as well types appearing inside other constructs such as procnames.
This is done when calling `Tenv.store` so that
- the computation can be parallelised somewhat (capture is parallel, merging is not)
- buck caching will benefit from smaller tenvs.
This saves about 24% of total memory devoted to the type environment.
Reviewed By: mbouaziz
Differential Revision: D15840054
fbshipit-source-id: 6f03be1a4
Summary:
This started as an attempt to understand how to modify the frontend to
inject destructors for C++ temporaries (see next diffs).
This diff rewrites the existing logic for computing the list of
variables that should be destroyed at the end of each statement, either
because it's the end of their syntactic scope or because control flow
branches outside of their syntactic scope.
The frontend translates a function from the last instructions to the
first, but scope computation needs to be done in the other direction, so
it's done in a separate pass *before* the main translation happens. That
first pass creates a map from statements in the AST to the list of
variables that should be destroyed at the end of these statements. This
is still the case now.
Before, that map would be computed in a bit of a weird way: scopes are
naturally a stack but instead of that the structure maintained was a
flat list + a counter to know where the current scope ended in that
list.
In this diff, redo the computation maintaining a stack of scopes
instead, which is a bit cleaner. Also treat more instructions as
introducing a new scope, eg if, for, ...
Reviewed By: mbouaziz
Differential Revision: D15674208
fbshipit-source-id: c92429e82
Summary:
Somewhat trivial: add a string to "Destruction" nodes to indicate why
they were created. Rename the main `instruction_aux` function into
`instruction_translate` (see next diff for why).
Reviewed By: mbouaziz
Differential Revision: D15674211
fbshipit-source-id: 8a7eda72c
Summary: This allows to match `foo<int_&>` and many other horrible names.
Reviewed By: mbouaziz
Differential Revision: D15825403
fbshipit-source-id: c892033aa
Summary:
This is a simple checker that identifies inefficient uses of `keySet` iterator where (not only the key but also) the value is accessed via `get(key)`. It is more efficient to use `entrySet` iterator which already returns both key-value pairs. This optimization would get rid of many extra lookups which can be expensive.
We simply traverse the CFG starting from the loop head upwards and pick up the map that is iterated over. Then, we check in the loop nodes if there is a call to `get(...)` over this map. If, so we report.
Reviewed By: ngorogiannis
Differential Revision: D15737779
fbshipit-source-id: 702465b4e
Summary:
The synthetic methods from `topl.Property` are now nonempty: they
simulate a nondeterministic automaton.
Reviewed By: jvillard
Differential Revision: D15668471
fbshipit-source-id: 050408283
Summary:
Instrument SIL according to TOPL properties. Roughly, the
instrumentation is a set of calls into procedures that simulate a
nondeterministic automaton. For now, those procedures are NOP dummies.
Reviewed By: jvillard
Differential Revision: D15063942
fbshipit-source-id: d22c2f6fa
Summary:
This messes with the deduplication heuristic when templated function
names show up in the error messages, since the heuristic demands that
the error messages are the same.
Reviewed By: mbouaziz
Differential Revision: D15374333
fbshipit-source-id: 70232d254
Summary:
Improve the error messages, change is more or less documented in the
code.
Reviewed By: mbouaziz
Differential Revision: D15374334
fbshipit-source-id: f1dd54180
Summary: No reason to use custom function name and not implement `Hashable`.
Reviewed By: mbouaziz
Differential Revision: D15097603
fbshipit-source-id: 7303fc15e
Summary:
TOPL properties are essentially automata, which will be modeled as a set
of procedures. The code-to-analyze makes calls into these procedures,
thereby driving the automaton. In this commit, these calls do not do
anything. The point is to prepare the hook-up mechanism.
Reviewed By: jvillard
Differential Revision: D14819650
fbshipit-source-id: d95ecdb3d
Summary:
A long-standing easter egg from infer error messages is the "object
`null` could be null and is dereferenced at line ...". I tried to fix
this but the part that generates the first "null" in the message and the
part that generates the second one are very far apart and it's hard to
see how to make the second part aware of the first in a clean way.
Instead, hack around it by detecting if the string representing the
value is literally `null` and in that case chop `could be null ` from
the error messages...
Reviewed By: jeremydubreil
Differential Revision: D14972324
fbshipit-source-id: ccc48ce6b
Summary:
We get messages like " object returned by `getArguments()` at line 101."
instead of " object returned by `getArguments()` could be null and is
dereferenced at line 101.". Tracking it down, it happens for
nullable-looking values, but I don't know why.
It seems that something regressed but I couldn't track it down.
So, just generate the error message in the same way as for non-nullable
objects in this case to fix the non-sensical message.
Reviewed By: jeremydubreil
Differential Revision: D14972325
fbshipit-source-id: 2a97501cc
Summary:
Feedback from peterogithub:
- mention which access path is being invalidated and accessed in the message
- mention the line at which it was invalidated (the line at which it's accessed is already the line at which we report)
- traces for stack variable/C++ temporary address escapes
- delete double implementation of the same functionality in
`PulseTrace`: `location_of_action_start` is the same as
`outer_location_of_action`...
Reviewed By: jberdine
Differential Revision: D14800294
fbshipit-source-id: 3d9ab9b3d
Summary:
The previous message formatting had regressed and produced non-sensical messages.
More importantly, remove template parameters from error messages to
trigger the heuristic in `InferPrint` that deduplicates errors that are
on the same line with the same error type and message. Without this we
get hundreds of reports that correspond to as many instantiations of the
same code.
Reviewed By: ngorogiannis
Differential Revision: D14747979
fbshipit-source-id: 3c4aad2b1
Summary:
Instead of emitting an ad-hoc builtin on variable declaration emit a new
metadata instruction. This allows us to remove the code matching on that
ad-hoc builtin that had to be inserted in several checkers.
Inferbo & pulse used that information meaningfully and had to undergo
some minor changes to cope with the new metada instruction.
Reviewed By: ezgicicek
Differential Revision: D14833100
fbshipit-source-id: 9b3009d22
Summary:
Bundle all non-semantic-bearing instructions into a `Metadata _`
instruction in SIL.
- On a documentation level this makes clearer the distinction between
instructions that encode the semantics of the program and those that are
just hints for the various backend analysis.
- This makes it easier to add more of these auxiliary instructions in
the future. For example, the next diff introduces a new `Skip` auxiliary
instruction to replace the hacky `ExitScope([], Location.dummy)`.
- It also makes it easier to surface all current and future such
auxiliary instructions to HIL as the datatype for these syntactic hints
can be shared between SIL and HIL. This diff brings `Nullify` and
`Abstract` to HIL for free.
Reviewed By: ngorogiannis
Differential Revision: D14827674
fbshipit-source-id: f68fe2110
Summary: In SIL, sometimes a return value is assigned to `__return_param`.
Reviewed By: ezgicicek, mbouaziz
Differential Revision: D14538590
fbshipit-source-id: dfbb74dc2
Summary:
This will be used in the future to determine what to do with destructors
in pulse.
Reviewed By: mbouaziz
Differential Revision: D14324759
fbshipit-source-id: bc3c34471
Summary:
This seems generally useful. Force people to do it in the future even if
they want to avoid having to update the frontend tests.
Reviewed By: mbouaziz
Differential Revision: D14324758
fbshipit-source-id: cdef3f72a
Summary:
You can only take the address of variables, field accesses, and array
accesses, the rest doesn't make sense.
Reviewed By: mbouaziz
Differential Revision: D14258484
fbshipit-source-id: 8ddcfe810
Summary:
Add an option to specify some classes that we really want to warn about
with the liveness checker, even when they appear used because of the
implicit destructor call inserted by the compiler.
Reviewed By: mbouaziz
Differential Revision: D13991129
fbshipit-source-id: 7fafdba84