Summary:
This diff adds semantics of Java function calls of enum `values` inside class initializers.
* Java class initializer function initializes a specific field `$VALUES`, which points to the list
of enum values.
* The `values` function of enum class returns the value of `$VALUES`.
The problem is when the `values` function is called inside the class initializer, for example:
```
enum Color {
RED,
GREEN,
BLUE;
static {
for (Color c : Color.values()) {}
}
}
```
This introduces a recursive dependency: the class initializer calls `Color.values` and the function
returns `Color.$VALUES` the value of which should be initialized in the class initializer.
To address the problem, this diff finds the value of `$VALUES` in its abstract memory when
`values` is called inside the class initializer.
Reviewed By: ezgicicek
Differential Revision: D18349281
fbshipit-source-id: 21766c20f
Summary: Follow ups will include error messaging that makes the choice clear
Reviewed By: artempyanykh
Differential Revision: D18347664
fbshipit-source-id: b6f005726
Summary:
Now that we have two similar functions, it becomes confusing, because `Pp.to_string` and `Pp.string_of_pp` can seem to do the same stuff, while in reality they do the opposite.
Well, it is still bit confusing, because the proper names would be
`Pp.pp_of_to_string` and `Pp.to_string_of_pp`, but I think this high
level order names are not necessary given that in most cases they will
be used as concrete functions.
I think `Pp.of_string` captures such usages better than `to_string` used to do: you need to pp stuff,
but you have a string (or, technically, a function that returns a string), so you pretty print OF that string, aren't you?
Reviewed By: jvillard
Differential Revision: D18245876
fbshipit-source-id: fd4b6ab68
Summary:
Steal a page from RacerD (and improve interface of) on using certain calls to assert
execution on a particular thread. Reduces FPs and FNs too.
Reviewed By: dulmarod
Differential Revision: D18199843
fbshipit-source-id: 5bdff0dfe
Summary:
This adds a more interesting value domain to pulse: concrete intervals.
There are still two main limitations:
1. arithmetic operations are all over-approximated: any assignment involving arithmetic operations is replaced by non-determinism
2. abstract values that are discovered to be equal are not merged into one
Reviewed By: skcho
Differential Revision: D18058972
fbshipit-source-id: 0492a590f
Summary: The way `<=` is used in `AbstractDomain` prevents infix use and forces bracketing it everywhere. Replace with simple `leq`.
Reviewed By: jvillard
Differential Revision: D18201854
fbshipit-source-id: 8175224e4
Summary: `Str.regexp_string` should be used to find a method name instead of `Str.regexp`.
Reviewed By: ezgicicek
Differential Revision: D18136598
fbshipit-source-id: c4b56dd64
Summary:
This will avoid printing stuff like
"0$?%__sil_tmpSIL_materialize_temp__n$2 declared" to the poor
unsuspecting user. The non-verbose stuff is used only by pulse so far as
far I can tell so hopefully this doesn't break anything.
Reviewed By: ezgicicek
Differential Revision: D17908943
fbshipit-source-id: 8ef4f1a8f
Summary: This diff introduces an inequality for the size alias targets, in order to get preciser array lengths after loops. The alias domain in inferbo was able to express strict equality between alias source and its targets, e.g. x=size(array). Now, for the size alias target, it can express less than or equal relations, e.g. x>=size(array).
Reviewed By: ezgicicek
Differential Revision: D17606222
fbshipit-source-id: 2557d3bd0
Summary: When we have an annotation like `Prop(varArg = X)` or ` ThreadSafe(enableChecks = true)`, we were not able to pick up the names of the parameters like `varArg` or `enableChecks`. This diff fixes that.
Reviewed By: skcho, ngorogiannis
Differential Revision: D17571377
fbshipit-source-id: 5293b5810
Summary:
As per previous diff, attempt to allocate fewer strings. This doesn't
seem to affect perf although allocating less might reduce memory
pressure.
Reviewed By: mityal
Differential Revision: D17423973
fbshipit-source-id: e2e37b071
Summary:
My spidey senses were tingling. Next diff uses the `pp` functions
everywhere it was kind of obvious how to change the code to do so. It
doesn't improve perf but is less clowny that way. It might lessen memory
pressure since allocating strings is expensive and this code was doing a
lot of it.
Reviewed By: ngorogiannis
Differential Revision: D17450324
fbshipit-source-id: 632cee584
Summary:
This diff makes the checkers, except biabduction, to use `typ` instead
of `root_typ` of `Load`/`Store` statemetns.
Reviewed By: dulmarod
Differential Revision: D17203105
fbshipit-source-id: 8be9b5158
Summary:
It adds typ field in Sil.Store. The field will be used by the analyzer in the following diffs.
Motivation: Interbo generates a symbolic value when evaluating expressions including parameter symbols. At that time, it is done with depending on their types, e.g., an integer, a pointer to struct or a pointer to array. Without the type, it is hard to generate a correct symbolic value that will be instantiated later in call sites. Thus, evaluating RHS of the store statement, the type of RHS is better to be given.
Reviewed By: dulmarod
Differential Revision: D17185346
fbshipit-source-id: f0945c40f
Summary: Use_after_free was used both for biabduction and pulse, and the biabduction version is blacklisted by default. As a result, the Pulse version was also disabled unintentionally. This changes the name of the old use_after_free so that now we can get use_after_free bugs whenever pulse is enabled.
Reviewed By: skcho
Differential Revision: D17182687
fbshipit-source-id: 539ca69de
Summary:
It adds `typ` field in Sil.Load. The field will be used by the analyzer in the following diffs.
Motivation: Interbo generates a symbolic value when evaluating expressions including parameter symbols. At that time, it is done with depending on their types, e.g., an integer, a pointer to struct or a pointer to array. Without the type, it is hard to generate a correct symbolic value that will be instantiated later in call sites. Thus, evaluating RHS of the load statement, the type of RHS is better to be given.
Reviewed By: jvillard
Differential Revision: D17163350
fbshipit-source-id: f7f0f1429
Summary:
It uses inline record for Sil.Load and Sil.Store for preparing the
following extention.
Reviewed By: dulmarod
Differential Revision: D17161288
fbshipit-source-id: 637ea7bfa
Summary: It prints non-verbose program variables in the report.
Reviewed By: ngorogiannis
Differential Revision: D17163943
fbshipit-source-id: c3f3c2887
Summary:
Implementation of write-serializer for Sqlite. Points of note:
- A Unix socket is used for communication. This avoids buffer-size limitations, as the objects we send for writing may exceed said limits.
- No daemon is used if running under buck or in genrule mode, as this usually means a single-threaded job capturing into the DB.
- When the daemon is running, read-only access is *not* enforced for other processes. This makes starting and stopping the daemon during Infer execution easier and more robust. In WAL mode this should not have any effect on performance.
- This version is not economical with connections, it uses one per query, todo.
Reviewed By: jvillard
Differential Revision: D17077183
fbshipit-source-id: fa9877d6c
Summary:
Write contention is becoming a problem in parallel capture (eg when make runs with high parallelism) or when analysis writes CFGs to the DB in parallel (eg when analysing blocks in ObC). This is believed to lead to BUSY errors in Sqlite.
This is step 1 of a process where all writes are cordoned-off in one module, and fixing the interface for that module.
Reviewed By: skcho
Differential Revision: D16985034
fbshipit-source-id: 3d7ce381b
Summary:
`from_string` is too benign in constrast with what this method is really
doing (and oh my what it is really doing).
There are a lot of potential follow ups to clean this up even more, but
this is beyond the scope of this diff
Reviewed By: jvillard
Differential Revision: D17070826
fbshipit-source-id: 3d190039e
Summary:
Access paths are too coarse to properly address C/C++ instructions, and lead to false positives and negatives. Begin the process of porting the underlying domains to access expressions, in a results-preserving way. This roughly consists in:
- Adding missing functions in `AccessExpression` to mirror those in `AccessPath`.
- Replacing `AccessExpression` for `AccessPath` and removing conversions from the former to the latter except in:
- Printing functions, to ensure formatting issues won't change tests/CI.
- Reporting/deduplication still happens through access path conversion, as we need an analogue of `ModuloThis` for `AccessExpression`.
- In selected places, ignore any access type not present in `AccessPath` (ie. dereference/take address of).
Reviewed By: jberdine
Differential Revision: D16856721
fbshipit-source-id: 5e3a88b75
Summary:
The models are only for biabduction so try to make that clearer in the
code and documentation.
Reviewed By: skcho
Differential Revision: D16603147
fbshipit-source-id: 4a2be53de
Summary: Sometimes programmers use integer underflow to get a maximum number of that type. This diff assumes that integer underflows from the syntactical form `(unsigned 0) - constant` is intended by the programmer, and suppresses the alarms of which.
Reviewed By: ezgicicek
Differential Revision: D16560639
fbshipit-source-id: 206f30dbc
Summary:
newer is better, right?
All the code changes in infer are because of core being bumped to v0.12.
Reviewed By: jberdine
Differential Revision: D16223183
fbshipit-source-id: f3c339966
Summary:
[apologies for the unreviewable diff...]
Get rid of HIL expressions in pulse. This finishes the HIL -> SIL
migration. The first step made pulse start from SIL instructions but
would translate most accesses to HIL to re-use most of the existing
pulse code. This diff gets rid of the intermediate translation of SIL
expressions to HIL expressions.
Big changes:
1. `PulseOperations` mostly rewritten, driven by using `Exp.t` instead of `HilExp.AccessExpression.t` for everything.
2. Stop trying to reverse-engineer what addresses mean in terms of
access paths from program variables. Rely on the trace pointing at
the right places in the code to be enough. This is because it wasn't
that useful (and could even be misleading when wrong) but could be
prohibitively expensive in degenerate cases (eg nodes with tens of
thousands of successive array accesses...)
3. `PulseAbductiveDomain.apply_post` now returns the computed return
value instead of recording it itself.
4. Change of vocabulary: `materialize` -> `eval`, `crumb` -> `event`
5. Function calls arguments are now evaluated prior to doing anything
else, which saves everything else from having to (remember to) do
that. In particular, this changes how models look quite a bit.
Reviewed By: mbouaziz
Differential Revision: D15986373
fbshipit-source-id: 1d79935de
Summary:
Reduces the size of the `tenv` by sharing values as most as possible, in an untyped - but supposedly safe - way, by using black magic on objects.
Can be reused for other things later.
Reviewed By: ngorogiannis
Differential Revision: D15855870
fbshipit-source-id: 169a4b86b
Summary:
Using `Marshal.to_string` to create SQLite values used in comparisons is brittle as there is no guarantee that it will return the same value for structurally equal values.
When adding sharing, this will definitely break.
From the SQLite queries I found, only `SourceFile` and `Procname` are used in comparisons.
I haven't tested performance.
It shouldn't change anything for `SourceFile` as there is no possible sharing.
It shouldn't change much for `Procname` as they are pretty small anyway.
Reviewed By: ngorogiannis
Differential Revision: D15923122
fbshipit-source-id: ce4af1fe3
Summary: Inject destructor calls to destroy a temporary when its lifetime ends.
Reviewed By: mbouaziz
Differential Revision: D15674209
fbshipit-source-id: 0f783a906
Summary:
Now that HIL doesn't help us anymore we need to reconstruct its mapping
"SIL logical var -> program access path". We already have everything we
need in pulse: it suffices to walk the current memory graph starting
from program variables until we find the value of the temporary we are
interested in.
This diff also builds some type machinery to make sure all accesses are
explained.
Reviewed By: mbouaziz
Differential Revision: D15824959
fbshipit-source-id: 722c81b39
Summary:
It turns out HIL gets in the way of a precise heap analysis. For
instance, instead of:
```
n$0 = *&x.f
_ = delete(&x)
*&y = n$0
```
HIL tries hard to forget about intermediate variables and shows instead
```
_ = delete(&x)
*&y = *&x.f
```
Oops, that's a use-after-delete, whereas the original code was safe.
While it's easy to write SIL programs that are completely unsound for
HIL, they are not generated very often from the frontends. In fact, the
problem became apparent only when making the clang frontend translate
C++ temporaries destructors, which produces the situation above
routinely.
This diff makes the minimal amount of change to make Pulse build and
produce equivalent results (minus HIL bugs) starting from SIL instead of
HIL. The reporting sucks for now because we need to translate SIL
temporaries back into program access paths. This is done in the next
diff.
Reviewed By: mbouaziz
Differential Revision: D15824961
fbshipit-source-id: 8e4e2a3ed
Summary: Preanalysis is performed at the frontend now. Hence, we don't need to repeatedly check/set when/if it is performed.
Reviewed By: mbouaziz
Differential Revision: D15863175
fbshipit-source-id: f9c6b7ae1
Summary:
One "interesting" feature of the approach of merging the captured targets in Java, is that we union their type environments, as opposed to store partial tenvs together with each source file, which is the case for Clang.
This means
- the final global type environment is potentially huge because it contains all the types in all targets.
- all analysis workers start by loading that tenv in memory, meaning we consume `|size of tenv| x #cpus` memory, which can tip the balance towards OOMs
This diff attempts to economise on global tenv size. This is done by increasing sharing which is then preserved by marshalling. It's done in a brute force way, with hashtables for each struct component, and is not fully effective due to the recursion amongst types and types names, as well types appearing inside other constructs such as procnames.
This is done when calling `Tenv.store` so that
- the computation can be parallelised somewhat (capture is parallel, merging is not)
- buck caching will benefit from smaller tenvs.
This saves about 24% of total memory devoted to the type environment.
Reviewed By: mbouaziz
Differential Revision: D15840054
fbshipit-source-id: 6f03be1a4
Summary:
This started as an attempt to understand how to modify the frontend to
inject destructors for C++ temporaries (see next diffs).
This diff rewrites the existing logic for computing the list of
variables that should be destroyed at the end of each statement, either
because it's the end of their syntactic scope or because control flow
branches outside of their syntactic scope.
The frontend translates a function from the last instructions to the
first, but scope computation needs to be done in the other direction, so
it's done in a separate pass *before* the main translation happens. That
first pass creates a map from statements in the AST to the list of
variables that should be destroyed at the end of these statements. This
is still the case now.
Before, that map would be computed in a bit of a weird way: scopes are
naturally a stack but instead of that the structure maintained was a
flat list + a counter to know where the current scope ended in that
list.
In this diff, redo the computation maintaining a stack of scopes
instead, which is a bit cleaner. Also treat more instructions as
introducing a new scope, eg if, for, ...
Reviewed By: mbouaziz
Differential Revision: D15674208
fbshipit-source-id: c92429e82
Summary:
Somewhat trivial: add a string to "Destruction" nodes to indicate why
they were created. Rename the main `instruction_aux` function into
`instruction_translate` (see next diff for why).
Reviewed By: mbouaziz
Differential Revision: D15674211
fbshipit-source-id: 8a7eda72c
Summary: This allows to match `foo<int_&>` and many other horrible names.
Reviewed By: mbouaziz
Differential Revision: D15825403
fbshipit-source-id: c892033aa
Summary:
This is a simple checker that identifies inefficient uses of `keySet` iterator where (not only the key but also) the value is accessed via `get(key)`. It is more efficient to use `entrySet` iterator which already returns both key-value pairs. This optimization would get rid of many extra lookups which can be expensive.
We simply traverse the CFG starting from the loop head upwards and pick up the map that is iterated over. Then, we check in the loop nodes if there is a call to `get(...)` over this map. If, so we report.
Reviewed By: ngorogiannis
Differential Revision: D15737779
fbshipit-source-id: 702465b4e
Summary:
The synthetic methods from `topl.Property` are now nonempty: they
simulate a nondeterministic automaton.
Reviewed By: jvillard
Differential Revision: D15668471
fbshipit-source-id: 050408283
Summary:
Instrument SIL according to TOPL properties. Roughly, the
instrumentation is a set of calls into procedures that simulate a
nondeterministic automaton. For now, those procedures are NOP dummies.
Reviewed By: jvillard
Differential Revision: D15063942
fbshipit-source-id: d22c2f6fa