Summary:
Expanding traces currently works in the following way:
Given a `TraceElem.Kind` `k` we want to report in `foo`, we look for a callee `C` of `foo` that has a `TraceElem.Kind` equal to `k` in its summary, grab the summary for `C`, then repeat until we bottom out.
This isn't very flexible: it insists on equality between `TraceElem.Kind`'s as the criteria for expanding a trace.
This diff introduces a new `matches` function for deciding when to expand a trace from a caller into a callee.
Clients that don't want strict equality can implement a fuzzier kind of equality inside this function.
I've gone ahead and done this for the trace elemes of thread-safety.
In the near future, equivalent access paths won't always compare equal from caller to callee, so we want to match their suffixes instead.
Reviewed By: jvillard
Differential Revision: D5914118
fbshipit-source-id: 233c603
Summary:
Use an SQLite database to store proc attributes, instead of files on disk.
Wrap SQLite operations in two layers:
1. `SqliteUtils` provides helper functions to make sure DB operations succeed
2. `KeyValue` provides a functor to expose a simple and type-safe key/value store backed by the SQLite DB.
Reviewed By: jberdine
Differential Revision: D5640053
fbshipit-source-id: 31050e5
Summary: It's useful to be able to configure both max depth and max width
Reviewed By: jeremydubreil
Differential Revision: D5801567
fbshipit-source-id: 0138cd7
Summary:
We supported globals as sources before, but we did so by allowing ClangTrace etc. to match against any access path in the footprint of the trace.
This is very powerful/flexible, but it's ultimately not a good idea because it leads to traces that are hard to read.
This is because a footprint source doesn't have any information about its provenance: we might know that the value came from a global, but we don't know where the read occurred.
The mechanism for handling procedure calls as sources already knows how to solve this problem.
This diff implements globals as sources as a special case of procedure call sources instead.
This will give us much nicer traces with full provenance of the read from the global.
Reviewed By: mbouaziz
Differential Revision: D5772299
fbshipit-source-id: 491ae81
Summary:
Calling `Exn.backtrace` doesn't give us the current backtrace, only the one of
the latest exception raised. Change the logic of toplevel exception catching to
print the right backtrace.
Also, do not use `Format` to print from `Config` as `Logging` may have messed
with it already.
Finally, throw in some colours!
Reviewed By: jberdine
Differential Revision: D5764825
fbshipit-source-id: cd51688
Summary:
- failwith police: no more `failwith`. Instead, use `Logging.die`.
- Introduce the `SimpleLogging` module for dying from modules where `Logging`
cannot be used (usually because that would create a cyclic dependency).
- always log backtraces, and show backtraces on the console except for usage errors
- Also point out in the log file where the toplevel executions of infer happen
Reviewed By: jeremydubreil
Differential Revision: D5726362
fbshipit-source-id: d7a01fc
Summary:
I'm working on parameterizing access trees with a config that will limit max depth + perhaps width if needed.
This is a stepping stone to enforcing max depth.
Reviewed By: jvillard
Differential Revision: D5693453
fbshipit-source-id: c15b0ee
Summary:
We now represent the footprint with an access trie, so this code is no longer required.
This lets us simplify things a bit
Reviewed By: jberdine
Differential Revision: D5664484
fbshipit-source-id: c35edf2
Summary:
In looking at summaries that Quandary took a long time to compute, one thing I notice frequently is redundancy in the footprint sources (e.g., I might see `Footprint(x), Footprint(x.f), Footprint(x*)`).
`sudo perf top` indicates that joining big sets of sources is a major performance bottleneck, and a large number of footprint sources is surely a big part of this (since we expect the number of non-footprint sources to be small).
This diff addresses the redundancy issue by using a more complex representation for a set of sources. The "known" sources are still in a set, but the footprint sources are now represented as a set of access paths (via an access trie).
The access path trie is a minimal representation of a set of access paths, so it would represent the example above as a simple `x*`.
This should make join/widen/<= faster and improve performance
Reviewed By: jberdine
Differential Revision: D5663980
fbshipit-source-id: 9fb66f8
Summary:
The previous widening operator added stars to the *end* of paths that existed in `next` but not `prev`. This is not enough to ensure termination in the case where the trie is growing both deeper and wider at the same time.
The newly added test demonstrates this issue. In the code, there's an ever-growing path of the form `tmp.prev.next.prev.next...` that wasn't summarized by the previous widening operator. The new widening is much more aggressive: it replaces *any* node present in `next` but not `prev` with a `*` (rather than trying to tack a star onto the end). This fixes the issue.
This issue was causing divergence on tricky doubly-linked list code in prod.
Reviewed By: jeremydubreil
Differential Revision: D5665719
fbshipit-source-id: 1310a92
Summary:
Instead of a whitelist and blacklist and default issue types and default
blacklist and filtering, consider a simpler semantics where
1. checkers can be individually turned on or off on the command line
2. most checkers are on by default
3. `--no-filtering` turns all issue types on, but they can then be turned off again by further arguments
This provides a more flexible CLI and is similar to other options in the infer
CLI, where "global" behaviour is generally avoided.
Dynamically created checkers (eg, AL linters) cause some complications in the
implementation but I think the semantics is still clear.
Also change the name of the option to mention "issue types" instead of
"checks", since the latter can be confused with "checkers".
Reviewed By: jberdine
Differential Revision: D5583238
fbshipit-source-id: 21de476
Summary:
Record the list of access paths (if any) used in the index expression for each array access.
This will make it possible to use array accesses as sinks in Quandary
Reviewed By: jeremydubreil
Differential Revision: D5531356
fbshipit-source-id: 8204909
Summary:
It's nice to have "raw" as the default kind of access path, since it's used much more often than the abstraction.
This is also a prereq for supporting index expressions in access paths, since we'll need mutual recursion between accesses and access paths.
Reviewed By: jeremydubreil
Differential Revision: D5529807
fbshipit-source-id: cb3f521
Summary: This seems more in line with the expectations of the JSON format.
Reviewed By: mbouaziz
Differential Revision: D5500939
fbshipit-source-id: 76dcc47
Summary: This gives us more expressive power when defining sources--we can use heuristics like "`foo(o)` only returns a source when `o` is not a constant".
Reviewed By: jvillard
Differential Revision: D5467935
fbshipit-source-id: f3d581d
Summary: Long lists will be broken into lines rather than stretch out on a single line.
Reviewed By: jvillard
Differential Revision: D5424323
fbshipit-source-id: 115f16a
Summary:
Conversion and reformat of infer source using ocamlformat
auto-formatting tool.
Current status:
- Because Reason does not handle docstrings, the output of the
conversion is not 'Warning 50'-clean, meaning that there are
docstrings with ambiguous placement. I'll need to manually fix
them just before landing.
Reviewed By: jvillard
Differential Revision: D5225546
fbshipit-source-id: 3bd2786
Summary:
Once the fixed/preexisting/introduced sets have been computed, they endure
further filtering which may decide that more of them are equal. These bugs just
get dropped on the floor. Put these into preexisting as well instead, at least
in the case of the "skip_duplicated_types_on_filenames" filter.
Reviewed By: martinoluca
Differential Revision: D5274248
fbshipit-source-id: 99b3f3d
Summary: This change introduces the a new argument that lets you restrict the results of a differential report to only certain files.
Reviewed By: mbouaziz
Differential Revision: D5236626
fbshipit-source-id: 52711e9
Summary:
First step toward addressing bad traces that happen in examples like
```
void sourceMethod() {
Obj source = (Obj) InferTaint.inferSecretSource();
callSameSink(null, source); // index: 1
}
void callSameSink(Obj o1, Obj o2) {
callMySink1(o1); // flows via o1 ~= index 0, don't expand
callMySink2(o2); // flows via o2 ~= index 1, can expand
}
void callMySink1(Obj o) {
... // maybe interesting something happens here that doesn't happen in callMySink2
InferTaint.inferSensitiveSink(o); // flows via o ~= index 0, can expand
}
void callMySink2(Obj o) {
InferTaint.inferSensitiveSink(o); // flows via o ~= index 0, can expand
}
```
The issue is that when we recreate a trace to the sink starting from `sourceMethod`, we don't know which of the calls to `callMySink` to expand/include in the trace.
If we expand the call to `callMySink(o1)`, we'll get a bogus trace.
In this example that's not such a big deal, but imagine the case where the first call to `callMySink` is a different function that transitively calls the sink through some long and confusing path.
Remembering the index at which taint flows into each sink will let us choose which sinks are safe to expand.
This diff just adds indexes to the API; it's not actually propagating the index info or using it during expansion yet.
Reviewed By: jeremydubreil
Differential Revision: D5170563
fbshipit-source-id: ba4b096
Summary: We were almost always using `~report_reachable:true`, and in the cases where we weren't it is fine to do so. In general, a sink could read any state from its parameters, so it makes sense to complain if anything reachable from them is tainted.
Reviewed By: mbouaziz
Differential Revision: D5169067
fbshipit-source-id: ea7d659
Summary: Have found this useful in Quandary for fbcode, where we want to do this for folly due to its use of assembly (details in comments).
Reviewed By: mbouaziz
Differential Revision: D5167564
fbshipit-source-id: bf6d7e0
Summary:
For now, we just support clearing the taint on a return value.
Ideally, we would associate a kind with the sanitizer and only clear taint that matches that kind.
However, it's fairly complicated to make that work properly with footprint sources.
I have some ideas about how to do it with passthroughs instead, but let's just do the simple thing for now.
Reviewed By: jeremydubreil
Differential Revision: D5141906
fbshipit-source-id: a5b8b5e
Summary:
An array has a static or dynamic length (number of elements), but it also has a
stride, determined by the type of the element: `sizeof(element_type)`. We don't
have a good `sizeof()` function available on SIL types, so record that stride
in the array type.
update-submodule: facebook-clang-plugins
Reviewed By: mbouaziz
Differential Revision: D4969697
fbshipit-source-id: 98e0670
Summary:
Glorious, glorious man pages.
This changes a bunch of things that were hard to break up from the diff, so the
resulting diff is big.
Use `Cmdliner.Manpage` to format man pages (and also format a bit ourselves
since it is so stubborn).
As a bonus, introduce the following subcommands:
```
infer run ... # same as default mode with -- before
infer capture ... # -a capture
infer compile ... # -a compile
infer analyze ... # InferAnalyze
infer report ... # InferPrint
infer diff ... # this one is not new
infer clang ... # InferClang, not that you should use it
```
The man pages can still be improved a lot. Notable missing sections:
`ENVIRONMENT`, stuff about .inferconfig, some example usage, `DESCRIPTION`, ...
Reviewed By: jberdine
Differential Revision: D4921083
fbshipit-source-id: 9602230
Summary: A lot of C++ library functions look like this, so it's important to have.
Reviewed By: mbouaziz
Differential Revision: D5026082
fbshipit-source-id: 6f421b6
Summary: Stops Quandary errors from getting dropped on the floor when it runs alongside the other checkers.
Reviewed By: jeremydubreil
Differential Revision: D5010801
fbshipit-source-id: 2847f61
Summary:
Ran the build with -w,-32 , delete code, repeat, until a fixpoint of no more warnings is reach.
Unfortunately we cannot fatal on w32 because ppx_compare can generate dead code (eg `compare_t` and only `compare` is used).
Reviewed By: mbouaziz
Differential Revision: D4945800
fbshipit-source-id: c95afb6
Summary:
Last step for converting thread-safety and quandary to HIL.
Push the logic for managing the id map and converting the instructions into a functor.
This way, client analyses can simply write HIL transfer functions and call the functor.
Reviewed By: jberdine
Differential Revision: D4989987
fbshipit-source-id: 485169e
Summary:
Went through the trouble of killing these warnings, make sure they do not creep
up again in the future.
Reviewed By: mbouaziz
Differential Revision: D4937493
fbshipit-source-id: 0e2eda4
Summary:
This is step further simplify the code to avoid cases where the summary of the procedure being analyzed can exist in two different versions:
# one version is the summary passed as parameter to every checker
# the other is a copy of the summary in the in-memory specs table
This diff implements:
# the analysis always run through the `Ondemand` module (was already the case before)
# the summary of the procedure being analyzed is created at the beginning of the on-demand analysis call
# all the checkers run in sequence, update their respective part of the payload and log errors to the error table
# the summary is store at the end of the on-demand analysis call
Reviewed By: sblackshear
Differential Revision: D4787414
fbshipit-source-id: 2d115c9
Summary:
Split Fieldname.t into `Java` and `Clang`. Each of them have different naming conventions and this way it's easier to differentiate between them.
Make `Java` variant store string instead of mangled since mangled part was always empty
Changes to `Clang` variant are coming in the next diff
Reviewed By: jeremydubreil
Differential Revision: D4746708
fbshipit-source-id: c5858a8
Summary: Add `QualifiedCppName.t` and some functions to manipulate it. More places will start using this type (such as `Procnames` or `Typ.Name`) in later diff
Reviewed By: jberdine
Differential Revision: D4738991
fbshipit-source-id: 8f20dd6
Summary: Fail early when there is no registered callbacks to run the analysis of a procedure on-demand
Reviewed By: sblackshear
Differential Revision: D4573726
fbshipit-source-id: a8ee74b
Summary:
Given two analysis results, it's now possible to compare them with the following command:
infer --diff --report-current reportA.json --report-previous reportB.json --file-renamings file_renamings.json
this command will then generate 3 files in `infer-out/differential/{introduced, fixed, preexisting}.json`, whose meaning is the following:
- `introduced.json` has all issues in `current` that are not in `previous`
- `fixed.json` has all issues in `previous` that are not in `current`
- `preexisting.json` has all issues that are in both `current` and `previous`
The json files generated can then be used to categorise results coming from incremental analyses of a codebase.
Reviewed By: jvillard
Differential Revision: D4482517
fbshipit-source-id: 1f7df3e
Summary:
Reimplement whitelists as a match against a single regexp. This allows one to
precompile the whitelist regexp to make fast check against a whitelist of fuzzy
qualifiers, instead of checks linear in the number of items in the whitelist.
Reviewed By: akotulski
Differential Revision: D4588278
fbshipit-source-id: 3bac614
Summary:
This is useful to make sure `./build-infer.sh clang` and `./build-infer.sh
java` work correctly.
I had to move one unit test inside a new unit/clang/ directory used only if clang is enabled, and create a stub for it in unit/clang_stubs/ for when clang is disabled.
Renamed `OCAML_SOURCES` to the more consistent and descriptive `OCAML_CONFIG_SOURCES`.
Reviewed By: jeremydubreil
Differential Revision: D4571997
fbshipit-source-id: 9502114
Summary:
We waste a lot of space storing the types of field accesses and comparing them sets/maps with access paths.
Yet almost none of the code ever looks at these types (only a tiny piece of code in thread-safety).
If we know the base type, we have enough information to recover the type of the field.
Let's do that instead.
Reviewed By: jeremydubreil
Differential Revision: D4567996
fbshipit-source-id: e7fd2da
Summary: Being forced to separately define `pp_element`/`pp_key` is uneccessary and makes it more cumbersome to create a set/map from an existing module that already defines `pp`.
Reviewed By: jeremydubreil
Differential Revision: D4517308
fbshipit-source-id: 9b17c9c
Summary:
At one point I thought we'd want to have lots of different schedulers for things like exploring loops in different orders, but that hasn't materialized.
Let's make the common use-case simpler by hiding the `Scheduler` parameter inside the `AbstractInterpreter` module.
We can always expose `MakeWithScheduler` later if we want to.
Reviewed By: jberdine
Differential Revision: D4508095
fbshipit-source-id: 726e051
Summary:
When the receiver type and return type of an unknown call are the same, propagate taint to both the receiver and the return type.
This does the right thing for common "builder-style" methods that both update and return the receiver.
We already had custom models for a few such methods (e.g., `StringBuilder.append`), but we can remove them now.
Reviewed By: jeremydubreil
Differential Revision: D4490071
fbshipit-source-id: 325ea88
Summary:
Remove the remaining uses of polymorphic equality `=`.
In case of basic types, this is replaced by String.equal or Int.equal.
In case of `= []`, this is replaced by `List.is_empty`.
In case of `= None`, this is replaced by `is_none`.
In case of a datatype definition such as `type a = A | B`,
a `compare_a` function is defined by adding `type a = A | B [@deriving compare]`
and a `equal_a` function is defined as `let equal_a = [%compare.equal : a]`.
In case of comparison with a polymorphic variant `= `Yes`, the equality
defined in `PVariant.(=)` is used. Typically, `open! Pvariant` is added
at the beginning of the file to cover all the uses.
Reviewed By: jberdine
Differential Revision: D4456129
fbshipit-source-id: f31c433
Summary: This will be useful in upcoming changes to the thread-safety analysis as well.
Reviewed By: dkgi
Differential Revision: D4402146
fbshipit-source-id: c750127
Summary:
A domain should not definite its initial state, since distinct users of the domain may want to choose different initial values.
For example, one user might want to bind all of the formals to some special values, and one user might want the initial domain to be an empty map
This diff makes this distinction clear in the types by (a) requiring the initial state to be passed to the abstract interpreter and (b) lifting the requirement that abstract domains define `initial`.
Reviewed By: jberdine
Differential Revision: D4359629
fbshipit-source-id: cbcee28
Summary:
There's a lot of boilerplate work to be done when adding a new kind of source.
This diff tries to reduce the boilerplate by making a functor do all the work.
The functor:
(1) adds a notion of "footprint kind" to the source
(2) packages the source with a call site
Reviewed By: jvillard
Differential Revision: D4349224
fbshipit-source-id: 5e1701a
Summary:
We currently can only model the return values of functions as sources.
In order to model inputs of endpoints as sources, we need the capability to treat the formals of certain functions as sources too.
This diff adds that capability by adding a function for getting the tainted sources to the source module, then using that info in the analysis.
Reviewed By: jeremydubreil
Differential Revision: D4314738
fbshipit-source-id: dd7d423
Summary:
Previously, summaries worked by flattening the access tree representing the post of the procedure into (in essence) a list of functions from caller input traces to callee output traces.
This is inefficient in many ways, and is also much more complex than just using the original access tree as the summary.
One big inefficiency of the old way is this: calling `Trace.append` is slow, and we want to do it as few times as possible.
Under the old summary system, we would do it at most once for each "function" in the summary list.
Now, we'll do it at most once for each node in the access tree summary.
This will be a smaller number of calls, since each node can summarize many input/output relationships.
Reviewed By: jeremydubreil
Differential Revision: D4271579
fbshipit-source-id: 34e407a
Summary:
Use In_channel and Out_channel operations instead of those in Pervasives. Don't
use physical equality on values that aren't heap-allocated since it doesn't help
the compiler generate faster code and the semantics is unspecified. Also use
phys_equal for physical equality.
Reviewed By: sblackshear
Differential Revision: D4232459
fbshipit-source-id: 36fcfa8
Summary:
Utils contains definitions intended to be in the global namespace for
all of the infer code-base, as well as pretty-printing functions, and
assorted utility functions mostly for dealing with files and processes.
This diff changes the module opened into the global namespace to
IStd (Std conflict with extlib), and moves the pretty-printing
definitions from Utils to Pp.
Reviewed By: jvillard
Differential Revision: D4232457
fbshipit-source-id: 1e070e0
Summary:
In Java, we handle unknown code by propagating behavior from the parameters of the unknown function call to the return value (or constructed object, in the case of a constructor). But we do this in a somewhat silly way--generating a new summary with these semantics at each unknown call site. Instead, this diff introduces these two options as predefined behaviors and adds specialized code for them.
As a side effect of this approach, unknown functions are no longer counted as passthroughs. This is ok; the original behavior was less of a reasoned decision and more of an unintended consequence of the way we decided to handle unknown code.
This new approach ought to be more efficient than the old one, and as a virtuous side effect it will be easier to specify how to handle unknown code in other languages like C++.
Reviewed By: jeremydubreil
Differential Revision: D4205624
fbshipit-source-id: bf97445
Summary:
There's not really a concept of callee here, so s/callee/callsite/, and "to"
suggests we get the callee whereas we update it, so s/to/with/.
Feel free to bikeshed further.
Reviewed By: sblackshear
Differential Revision: D4153426
fbshipit-source-id: 6ea762c
Summary: A must-have for reporting taint errors and any other interprocedural error where the trace is sufficiently complex.
Reviewed By: jvillard
Differential Revision: D4124072
fbshipit-source-id: 26b3b2b
Summary: A must-have for reporting taint errors and any other interprocedural error where the trace is sufficiently complex.
Reviewed By: jvillard
Differential Revision: D4106352
fbshipit-source-id: b2677e6
Summary:
Previously, we recorded direct sinks as sinks and transitive sinks as passthroughs. This makes it difficult to create an expanded interprocedural trace when recording an error because we can't distinguish between sinks (which we want to expand) and passthroughs (which we don't). This diff changes recording of sinks so that a sink is now the *last* function in a trace to call a sink. To find out what the original sink was, the summary for the transitive sink in the trace will now need to be (recursively) expanded until we bottom out in the original sink.
Will do the same for sources in a follow-up diff.
Reviewed By: cristianoc
Differential Revision: D4103759
fbshipit-source-id: 6f435f5
Summary:
this makes frontends no longer depend on SymExec.ml. `ModelBuiltins` was split into two modules:
- `BuiltinDecl` with procnames for builtins (used to determine whether some function is a builtin)
- `BuiltinDefn` with implementations used by `SymExec`
- they both have similar type defined in `BUILTINS.S` which makes sure that new builtin gets added into both modules.
During the refactor I ran some scripts:
`BuiltinDecl.ml`:
let X = create_procname "X"
cat BuiltinDecl.ml | grep "create_procname" | tail -70 | awk ' { print $1,$2,$3,$4,"\42"$2"\42"} '
then manually confirm string match. Exceptions:
"__exit" -> "_exit"
"objc_cpp_throw" -> "__infer_objc_cpp_throw"
__objc_dictionary_literal
nsArray_arrayWithObjects
nsArray_arrayWithObjectsCount
`BuiltinDefn.ml`:
let X = Builtin.register BuiltinDecl.X execute_X
cat BuiltinDecl.ml | grep "create_procname" | tail -70 | awk ' { print $1,$2,$3,"Builtin.register BuiltinDecl."$2,"execute_"$2} '
then, fix all compilation problems
Reviewed By: jberdine
Differential Revision: D3951035
fbshipit-source-id: f059602
Summary:
Right now, taint gets lost if it flows into a constructor or procedure whose implementation is missing.
Since the core Java (e.g., String) and Android classes (e.g, Intent) are among these, this is bad.
We could handle this by writing a bunch of models instead, but that would be a lot of work (plus we may still miss cases).
Reviewed By: jvillard
Differential Revision: D4051591
fbshipit-source-id: 65851c8
Summary:
Change Sil.Call instruction to have only a single optional return
identifier, insted of a list. Essentially none of the code handled
multiple return identifiers. Also, add the type of the return
identitifier to Call instructions.
Reviewed By: sblackshear
Differential Revision: D3919358
fbshipit-source-id: d2d4f72