Summary:
A function can both be a sink and propagate source info, but we currently ignore the summary for any function that is also a sink.
This will cause us to under-report for (e.g.) `src1 = source(); src2 = strcpy(dest, src1); exec(src2)`.
This is both a potential buffer overflow and a potential shell injection, but we won't report the second issue.
Reviewed By: jberdine
Differential Revision: D5676167
fbshipit-source-id: 232ab2f
Summary:
In looking at summaries that Quandary took a long time to compute, one thing I notice frequently is redundancy in the footprint sources (e.g., I might see `Footprint(x), Footprint(x.f), Footprint(x*)`).
`sudo perf top` indicates that joining big sets of sources is a major performance bottleneck, and a large number of footprint sources is surely a big part of this (since we expect the number of non-footprint sources to be small).
This diff addresses the redundancy issue by using a more complex representation for a set of sources. The "known" sources are still in a set, but the footprint sources are now represented as a set of access paths (via an access trie).
The access path trie is a minimal representation of a set of access paths, so it would represent the example above as a simple `x*`.
This should make join/widen/<= faster and improve performance
Reviewed By: jberdine
Differential Revision: D5663980
fbshipit-source-id: 9fb66f8
Summary:
The previous widening operator added stars to the *end* of paths that existed in `next` but not `prev`. This is not enough to ensure termination in the case where the trie is growing both deeper and wider at the same time.
The newly added test demonstrates this issue. In the code, there's an ever-growing path of the form `tmp.prev.next.prev.next...` that wasn't summarized by the previous widening operator. The new widening is much more aggressive: it replaces *any* node present in `next` but not `prev` with a `*` (rather than trying to tack a star onto the end). This fixes the issue.
This issue was causing divergence on tricky doubly-linked list code in prod.
Reviewed By: jeremydubreil
Differential Revision: D5665719
fbshipit-source-id: 1310a92
Summary:
This is a check for when an unavailable class is being allocated.
This diff also adds a check for the context to remove false positives: If the class is not available but the method calls are wrapped in a check whether the class is available, then don't report.
Reviewed By: jvillard
Differential Revision: D5631191
fbshipit-source-id: 2082dfe
Summary:
Calling exit at the end of a proc does not create unreachable code, prior to this commit inferbo reports that it does.
We extend collect_instrs to detect when we're at the end of a procedure in C and not report on unreachable code if we call a procedure there.
Reviewed By: mbouaziz
Differential Revision: D5623637
fbshipit-source-id: 0d5f326
Summary: This check is not possible in Java as it natirally happens in the totally legit case of the `try ... finally`.
Reviewed By: sblackshear
Differential Revision: D5568802
fbshipit-source-id: 24ca074
Summary:
Instead of a whitelist and blacklist and default issue types and default
blacklist and filtering, consider a simpler semantics where
1. checkers can be individually turned on or off on the command line
2. most checkers are on by default
3. `--no-filtering` turns all issue types on, but they can then be turned off again by further arguments
This provides a more flexible CLI and is similar to other options in the infer
CLI, where "global" behaviour is generally avoided.
Dynamically created checkers (eg, AL linters) cause some complications in the
implementation but I think the semantics is still clear.
Also change the name of the option to mention "issue types" instead of
"checks", since the latter can be confused with "checkers".
Reviewed By: jberdine
Differential Revision: D5583238
fbshipit-source-id: 21de476
Summary:
We previously lumped ownership predicates in with all other predicates. That limited us to a flat ownership domain.
This diff separates out the ownership predicates so we can have a richer lattice of predicates with each access path.
This lets us be more precise; for example, we can now show that
```
needToOwnBothParams(Obj o1, Obj o2) {
Obj alias;
if (*) { alias = o1; } else { alias = o2; }
alias.f = ... // both o1 and o2 need to be owned for this to be safe
}
void ownBothParamsOk() {
needToOwnBothParams(new Obj(), new Obj()); // ok, would have complained before
}
```
is safe.
Reviewed By: jberdine
Differential Revision: D5589898
fbshipit-source-id: 9606a46
Summary:
This makes it easier to test a single checker.
Also refactor the code to make it harder to mess up the list of default/all checkers.
Reviewed By: sblackshear
Differential Revision: D5583209
fbshipit-source-id: 7c919b2
Summary:
Use jbuilder to build infer instead of ocamlbuild. This is mainly to get faster builds:
```
times in 10ms, ±differences measured in speedups, 4 cores
| | ocb total | jb | ±total | ocb user | jb | ±user | ocb cpu | jb | ±cpu | ocb sys | jb | ±sys |
|-----------------------------------+-----------+------+--------+----------+------+-------+---------+-----+------+---------+------+------|
| byte from scratch | 6428 | 2456 | 2.62 | 7743 | 6662 | 1.16 | 138 | 331 | 2.40 | 1184 | 1477 | 0.80 |
| native from scratch | 9841 | 4289 | 2.29 | 9530 | 8834 | 1.08 | 110 | 245 | 2.23 | 1373 | 1712 | 0.80 |
| byte after native | 29578 | 1602 | 18.46 | 4514 | 4640 | 0.97 | 170 | 325 | 1.91 | 543 | 576 | 0.94 |
| change infer.ml byte | 344 | 282 | 1.22 | 292 | 215 | 1.36 | 96 | 99 | 1.03 | 040 | 066 | 0.61 |
| change infer.ml native | 837 | 223 | 3.75 | 789 | 174 | 4.53 | 98 | 99 | 1.01 | 036 | 47 | 0.77 |
| change Config.ml byte | 451 | 339 | 1.33 | 382 | 336 | 1.14 | 97 | 122 | 1.26 | 056 | 80 | 0.70 |
| change Config.ml native | 4024 | 1760 | 2.29 | 4585 | 4225 | 1.09 | 127 | 276 | 2.17 | 559 | 644 | 0.87 |
| change cFrontend_config.ml byte | 348 | 643 | 0.54 | 297 | 330 | 0.90 | 96 | 67 | 0.70 | 038 | 102 | 0.37 |
| change cFrontend_config.ml native | 1480 | 584 | 2.53 | 1435 | 906 | 1.58 | 106 | 185 | 1.75 | 136 | 178 | 0.76 |
#+TBLFM: $4=$2/$3;f2::$7=$5/$6;f2::$10=$9/$8;f2::$13=$11/$12;f2
50 cores
| | ocb total | jb | ±total | ocb user | jb | ±user | ocb cpu | jb | ±cpu | ocb sys | jb | ±sys |
|---------------------+-----------+------+--------+----------+------+-------+---------+----+------+---------+------+------|
| byte from scratch | 9114 | 2061 | 4.42 | 9334 | 5133 | 1.82 | | | 0/0 | 2566 | 1726 | 1.49 |
| native from scratch | 13481 | 3967 | 3.40 | 12291 | 7608 | 1.62 | | | 0/0 | 3003 | 2100 | 1.43 |
| byte after native | 3467 | 1476 | 2.35 | 5067 | 3912 | 1.30 | | | 0/0 | 971 | 801 | 1.21 |
#+TBLFM: $4=$2/$3;f2::$7=$5/$6;f2::$10=$9/$8;f2::$13=$11/$12;f2
```
Menu:
1. Write a jbuild file, autogenerated from jbuild.in because we need to fill in
some information at build-time (really, at configure time, but TODO), such as
whether or not clang is enabled.
2. Nuke lots of stuff from infer/src/Makefile that is now in the jbuild file
3. The jbuild file lives in infer/src/ so it can see all the sources. If we put it somewhere else, eg, infer/, then `jbuilder` scans too many files (all irrelevant) and takes 2.5s to start instead of .8s. Adding irrelevant directories to jbuild-ignore does not help.
4. jbuilder does not support subdirectories, so resort to listing all the
source files in the generated jbuild (only source directories need to be
manually listed in jbuild.in though). Still, the generated .merlin is wrong
and makes merlin find source files in _build, so manually tune it to get
good merlin support. We also lose some of merlin for unit tests as it
cannot see their build artefacts anymore.
5. checkCopyright gets its own jbuild because it's standalone. Also, remove
some deprecation warnings in checkCopyright due to the new version of Core from
a while ago.
6. Drop less-used Makefile features (they had regressed anyway) such as
building individual modules. Also, building mod_dep.pdf now takes all the
source files available so they better build (before, it would only take the
source files from the config, eg with or without clang) (that's pretty minor).
7. The toplevel is now built as a custom toplevel because that was easier. It
should soon be even easier: https://github.com/janestreet/jbuilder/issues/210
8. Move BUILTINS.mli to BUILTINS.ml because jbuilder is not happy about
interface files without implementations.
In particular, I did not try to migrate too much of the Makefile logic to jbuilder,
more can be done in the future.
Reviewed By: jberdine
Differential Revision: D5573661
fbshipit-source-id: 4ca6d8f
Summary:
Previous version was hard to understand because it was doing many things within same code. New version has different code for Arrays, Structs and others.
There is some copy-paste, but it's easier to follow code (open to suggestions though)
Reviewed By: dulmarod
Differential Revision: D5547999
fbshipit-source-id: 77ecb24
Summary: It wasn't using code from `std::vector::empty` which recently was improved. Instead of inlining `std::vector::empty`, call it to know whether vector is empty or not.
Reviewed By: jvillard
Differential Revision: D5573379
fbshipit-source-id: e024a42
Summary: Useful for identifying user-controlled array accesses that could lead to buffer overflows
Reviewed By: mbouaziz
Differential Revision: D5520985
fbshipit-source-id: 92984f6
Summary:
When a sink name is specified in `.inferconfig` or in OCaml, it might conflict with a function of the same name that has a different number of args.
We shouldn't try to create a sink in this case, and we definitely shouldn't crash.
Reviewed By: jeremydubreil
Differential Revision: D5561216
fbshipit-source-id: fa1859b
Summary:
In some cases we normalize expressions to check some facts about them. In these
cases, trying to keep as much information as possible in the expression, such
as the fact it comes from a `sizeof()` expression, is not needed. Doing
destructive normalization allows us to replace `sizeof()` by its
statically-known value.
closes#706
Reviewed By: mbouaziz
Differential Revision: D5536685
fbshipit-source-id: cc3d731
Summary:
With current model, there are issues with cxx range loop. It looks like
it comes from std::vector::size model.
example of such FP:
```
int t = vec.size();
for(auto& elem : vec) {
auto x = elem
}
```
Reviewed By: jvillard
Differential Revision: D5545914
fbshipit-source-id: fbe55b3
Summary: Those are not particularly relevant for the biabduction analysis. It would be easy to have a dedicated checker for this if we happen to need one day.
Reviewed By: sblackshear
Differential Revision: D5530834
fbshipit-source-id: 316e60f
Summary:
The Eradicate `Nullable` checker should now be run using:
infer -a checkers --eradicate ...
Reviewed By: mbouaziz
Differential Revision: D5529226
fbshipit-source-id: 0de2956
Summary:
This is unsound but will help the analysis to report less false alarms with the common pattern:
if (a.get() != null) {
a.get().foo();
}
Reviewed By: sblackshear
Differential Revision: D5528227
fbshipit-source-id: 750db4a
Summary: This was reusing the side effects of the `add_constraints_on_retval` for the final purpose of being angelic and just assigning a fresh value to the lhs of the load.
Reviewed By: sblackshear
Differential Revision: D5507037
fbshipit-source-id: ec1c89c
Summary:
Bumps facebook-clang-plugins to a version that outputs sizeof() info in bytes and not bits.
update-submodule: facebook-clang-plugins
Reviewed By: akotulski
Differential Revision: D5526747
fbshipit-source-id: 6019542
Summary:
Works the same way as read/write races on fields, except that are more relaxed (er, unsound) in deciding whether two containers may alias.
This is needed to avoid reporting a ton of FP's; full explanation in comments.
Reviewed By: da319
Differential Revision: D5493404
fbshipit-source-id: 0a5d8b1
Summary: The `--failures-allowed` was doing for the Clang frontend what `--keep-doing` was doing for the backend. This revision merges the two options to simplify the Infer CLI and our tests.
Reviewed By: jvillard
Differential Revision: D5474347
fbshipit-source-id: 09bcea4
Summary:
update-submodule: facebook-clang-plugins
Moving to a newer version of clang, see ffb5dd0114
Reviewed By: jvillard
Differential Revision: D5452529
fbshipit-source-id: 28bc215
Summary:
The way we represented container writes before was pretty hacky: just use a dummy field for the name of the method that performs the container write.
This diff introduces a new access kind for container writes that is much more structured.
This will make it easier to soundly handle aliasing between containers and support container reads in the near future.
Reviewed By: da319
Differential Revision: D5465747
fbshipit-source-id: e021ec2
Summary: Using a dedicated abstract domain, like Quandary does, is more suitable for taint analysis.
Reviewed By: sblackshear
Differential Revision: D5473794
fbshipit-source-id: c917417
Summary:
Both `stringWithUTF8String` and `stringWithString` implements copy semantics that copies the content of their parameter into a newly allocated buffer. We modeled this as pointer assignment in the past, which means that once we write
```
NSString* foo() {
char buf[...];
...
return [NSString stringWithUTF8String:buf];
}
```
We are going to get a spurious stack variable address escape report because local pointer `buf` is assigned to the newly created string and the string gets returned.
This diff tries to address the issue by heap-allocating a buffer and `memcpy` the contents in `stringWithUTF8String` and `stringWithString`. But this change will create another problem: the allocated buffer will be reported as leaked by the backend, while in reality those buffers won't actually be leaked as they are allocated in a region that will be periodically autoreleased. To suppress spurious memory leak FPs, I added another attribute `Awont_leak` that will suppress the leakage report on any expressions that get tagged with it.
Reviewed By: jeremydubreil
Differential Revision: D5403084
fbshipit-source-id: df6de7f
Summary:
Pretty basic: warn when we see an assignment instruction `x = ...` and `x` is not live in the post of the instruction.
Only enabled for Clang at the moment because linters already warn on this for Java. But we can enable it later if we want to (should be fully generic).
Reviewed By: jeremydubreil
Differential Revision: D5450439
fbshipit-source-id: 693514c
Summary: We get the wrong answer on most of them for now, but that is expected
Reviewed By: ngorogiannis
Differential Revision: D5429242
fbshipit-source-id: 4899079
Summary:
This commit avoids precision loss on pruning.
// x -> [s$1, s$2]
if(x) { ... }
// x -> ?
before: x -> [min(0, s$1), max(0, s$2)]
because two x values, [0, 0] (true case) and [s$1, s$2] (false case), were joined after the if branch.
after: x -> [s$1, s$2]
Reviewed By: mbouaziz
Differential Revision: D5431009
fbshipit-source-id: 14a9efe
Summary:
This just makes the warnings silent for now. We may improve the analysis to check if the null check on the captured fields are consistent with the annotation on the corresponding parameters.
Eradicate also has the same issue. I added a test to outline this. The biabduction analysis will also probably fail on the same of annotation lookup. We may want implement the proper fix at the level of `Annotation.field_has_annot`.
Reviewed By: sblackshear
Differential Revision: D5419243
fbshipit-source-id: 6460de8
Summary: CFG nodes were not connected and some instructions ended up in wrong place. Fix those issues
Reviewed By: dulmarod
Differential Revision: D5406720
fbshipit-source-id: 2a70e1a
Summary:
Problem: The analyzer did not know that the value of `v.size()` is an alias of `v.infer_size`, so `v.infer_size` is not pruned by the if condition. As a result it raises a false alarm.
void safe_access(std::vector<int> v) {
if (v.size() >= 10) {
v[9] = 1; // error: BUFFER_OVERRUN Offset: [9, 9] Size: [5, 5]
}
}
void call_safe_access_Good() {
std::vector<int> v(5, 0);
safe_access(v);
}
Solution: Adding alias for return value to the abstract domain.
Now Inferbo can prune `v.infer_size` because it knows that the value of `v.size()` is an alias of `v.infer_size`. There is already an alias domain in Inferbo, so we added a specific room for the retrun value.
Reviewed By: jvillard, mbouaziz
Differential Revision: D5396988
fbshipit-source-id: 4a4702c
Summary: Adding to the Quandary tests, the list of tests that are already working for the bi-abduction based taint analysis.
Reviewed By: sblackshear
Differential Revision: D5395734
fbshipit-source-id: c4f2e79
Summary:
:
because otherwise people would believe they can use the internal representation of these std lib but it fails for our models.
Reviewed By: jvillard
Differential Revision: D5368671
fbshipit-source-id: 4e53d5a
Summary:
:
Get rid of model location in reports.
The goal is to avoid changing `issues.exp` whenever a model is updated.
Reviewed By: jvillard
Differential Revision: D5356608
fbshipit-source-id: 88ecaba