Summary:
Unknown functions may create false positives as well as false negatives
for Pulse. Let's consider that unknown functions behave "functionally",
or at least that a functional behaviour is a possible behaviour for
them: when called with the same parameter values, they should return the
same value.
This is implemented purely in the arithmetic domain by recording
`v_return = f_unknown(v1, v2, ..., vN)` for each call to unknown
functions `f_unknown` with values `v1`, `v2`, ..., `vN` (and return
`v_return`). The hope is that this will create more false negatives than
false positives, as several FPs have been observed on real code that
would be suppressed with this heuristic.
The other effect this has on reports is to record hypotheses made on the
return values of unknown functions into the "pruned" part of formulas,
which inhibits reporting on paths whose feasibility depends on the
return value of unknown functions (by making these issues latent
instead). This should allow us to control the amount of FPs until we
model more functions.
Reviewed By: skcho
Differential Revision: D27798275
fbshipit-source-id: d31cfb8b6
Summary:
Add a new `PathContext.t` component to the abstract state. For now it
tracks only the current "timestamp" of symbolic execution inside the
procedure, i.e. which step of symbolic execution we are in (bumped by 1
each time we've executed one instruction). In the future this will also
hold, eg, which conditionals we've been through on the path (for
reporting traces with that information).
Most of the diff is about propagating the path context through many of
the APIs.
We use timestamps only in `MustBeValid` attributes to report the first
incorrect access in a function call for now.
Reviewed By: skcho
Differential Revision: D28674726
fbshipit-source-id: 2cd825e73
Summary:
Spoiler alert: we don't. The next diffs fix that.
When there are several invalid accesses to report at a function call
instruction, we want to report the first one to occur within the
function. This is to avoid confusing reports where pulse reports, eg, a
null dereference for a pointer at a point where it's already been
dereferenced before in the same function.
Reviewed By: skcho
Differential Revision: D28674730
fbshipit-source-id: acb029e4b
Summary:
This is needed for the next diff. It was a bit annoying to report leaks
in two different places, now it's just in one.
Reviewed By: skcho
Differential Revision: D28576768
fbshipit-source-id: 4f23b43cb
Summary:
Add an option for realloc and fiddle with the other options' help for
consistency.
Moved the memory leak test to memory_leak.c and added more.
Moved the place where we take the options into account closer to their
corresponding models to defend a bit against modifying one without
modifying the other.
Reviewed By: da319
Differential Revision: D28543340
fbshipit-source-id: 75894d06d
Summary:
Let's model all the dynamic memory management functions as they all work
together and are important for a lot of C projects.
Reviewed By: ezgicicek
Differential Revision: D28543008
fbshipit-source-id: f130e1ab6
Summary:
This fixes a memory leak false positive. When collecting unreachable
values we should be careful to take the equality relation into account.
Equal values are normally canonicalised but only with respect to "known"
equalities. This makes sure variables that are live thanks to the
"pruned" equalities are not discarded from the state.
Reviewed By: skcho
Differential Revision: D28382642
fbshipit-source-id: 2b898d754
Summary:
This makes reports more readable: they were all at the end of functions,
currently.
This is actually quite tricky to do as it involves detecting which
locations are unreachable.
Some of this logic can/should probably be shared with
`AbductiveDomain.discard_unreachable` but at the moment that's not the
case.
Reviewed By: skcho
Differential Revision: D28382590
fbshipit-source-id: bd4239a0c
Summary:
Most/all of the time we expect the history of the value to faithfully
trace how it got allocated. That history was then added as a prefix of
the trace leading to the same place, leading to duplicate information in
the report trace.
We may need to do the same for other bug types.
Reviewed By: ezgicicek
Differential Revision: D28536891
fbshipit-source-id: a83a2d038
Summary: Showcase the trace duplication, fixed in a further diff.
Reviewed By: ezgicicek
Differential Revision: D28536889
fbshipit-source-id: f23636368
Summary:
Turns out the mistake was pretty simple: we just forgot to keep the
history of the return value in the callee and add it to the caller's.
Reviewed By: skcho
Differential Revision: D28385941
fbshipit-source-id: 40fe09c99
Summary:
- Changed "passed as argument to f" to "in call to f", as these do not
always correspond to passing an argument (eg could be a value returned
from f)
- Changed "assigned" to "returned" when appropriate
- Changed the model of malloc() to not say "allocated" in the null case
- Don't print "returned from f" when there was no event inside f: just
print "in call to f".
Reviewed By: da319
Differential Revision: D28413900
fbshipit-source-id: bc85625e3
Summary:
The order was reversed when printing the trace, leading to confusion.
Also make sure we indicate which part of the trace we are printing when
there is more than one part (either context + access or invalidation +
access, or all three).
Also start nesting at <calling context length> to better represent the
role of the calling context visually.
Reviewed By: da319
Differential Revision: D28329263
fbshipit-source-id: b691fb1f4
Summary:
There's already all the ingredients to treat function pointers pretty
well, even when stored inside (const) globals.
In OpenSSL they use something like the added tests but the globals are
not const... This may need tweaking via an option, eg to inline all
global initializers, or filtered by global names/file names. Or just
use the existing --pulse-model-{alloc,release}-pattern options.
Reviewed By: skcho
Differential Revision: D28221651
fbshipit-source-id: 5399f1141
Summary:
When garbage-collecting addresses we would also remove their attributes.
But even though the addresses are no longer allocated in the heap, they
might show up in the formula and so we need to remember facts about
them.
This forces us to detect leaks closer to the point where addresses are
deleted from the heap, in AbductiveDomain.ml. This is a nice refactoring
in itself: doing so fixes some other FNs where we sometimes missed leak
detection on dead addresses.
This also makes it unecessary to simplify InstanceOf eagerly when
variables get out of scope.
Some new {folly,std}::optionals false positives that either are similar to existing ones or involve unmodelled smart pointers.
Reviewed By: da319
Differential Revision: D28126103
fbshipit-source-id: e3a903282
Summary:
Building on the infra in the previous commits, "fix" all the call sites
that introduce invalidations to make sure they also update the
corresponding histories. This is only possible to do when the access
leading to the invalidation can be recorded. Right now the only place
that's untraceable is the model of `free`/`delete`, because it happens
to be the only place where we invalidate an address without knowing
where it comes from (`free(v)`: what was v's access path? we could track
this in the future).
Reviewed By: skcho
Differential Revision: D28118764
fbshipit-source-id: de67f449e
Summary:
In main() all latent issues are manifest issues as the only parameters
are user-controlled.
Reviewed By: skcho
Differential Revision: D28121535
fbshipit-source-id: eab54d5bc
Summary:
As explained in the previous diff: when the access trace goes through
the invalidation step there is no need to print the invalidation trace
at all.
Note: only a few sources of invalidation are handled at the moment. The
following diffs gradually fix the other sources of invalidation.
Reviewed By: skcho
Differential Revision: D28098335
fbshipit-source-id: 5a5e6481e
Summary:
The eventual goal is to stop having separate sections of the trace
("invalidation part" + "access part") when the "access part" already
goes through the invalidation step. For this, it needs to record when a
value is made invalid along the path.
This is also important for assignements to NULL/0/nullptr/nil: right now
the way we record that 0 is not a valid address is via an attribute
attached to the abstract value that corresponds to 0. This makes traces
inconsistent sometimes: 0 can appear in many places in the same function
and we won't necessarily pick the correct one. In other words, attaching
traces to *values* is fragile, as the same value can be produced in many
ways. On the other hand, histories are stored at the point of access, eg
x->f, so have a much better chance of being correct. See added test:
right now its traces is completely wrong and makes the 0 in `if
(utf16StringLen == 0)` the source of the NULL value instead of the
return of `malloc()`!
This diff makes the traces slightly more verbose for now but this is
fixed in a following diff as the traces that got longer are those that
don't actually need an "invalidation" trace.
Reviewed By: skcho
Differential Revision: D28098337
fbshipit-source-id: e17929259
Summary:
See added test: pulse sometimes insisted that an issue was latent even
though the condition that made it latent could not be influenced (hence
could the issue could never become manifest) by callers because it was
unrelated to the pre, i.e. it came from a mutation inside the function.
In these cases, we want to report the issue straight away instead of
keeping it latent.
Reviewed By: skcho
Differential Revision: D28002725
fbshipit-source-id: ce9e6f190
Summary:
Before returning a summary, restore formals to their initial values.
This gets rid of a false latent because the value in the path condition
is now garbage-collected.
Added a test for the tricky case of structs passed as values.
Reviewed By: skcho
Differential Revision: D28001229
fbshipit-source-id: 23dda5b43
Summary:
Whenever an equality "t = v" (t an arbitrary term, v a variable) is
added (or "v = t"), remember the "t -> v" mapping after canonicalising t
and v. Use this to detect when two variables are equal to the same term:
`t = v` and `t = v'` now yields `v = v'` to be added to the equality
relation of variables. This increases the precision of the arithmetic
engine.
Interestingly, the impact on most code I've tried is:
1. mostly same perfs as before, if a bit slower (could be within noise)
2. slightly more (latent) bugs reported in absolute numbers
I would have expected it to be more expensive and yield fewer bugs (as
fewer false positives), but there could be second-order effects at play
here where we get more coverage. We definitely get more latent issues
due to dereferencing pointers after testing nullness, as can be seen in
the unit tests as well, which may alone explain (2).
There's some complexity when adding term equalities where the term
is linear, as we also need to add it to `linear_eqs` but `term_eqs` and
`linear_eqs` are interested in slightly different normal forms.
Reviewed By: skcho
Differential Revision: D27331336
fbshipit-source-id: 7314e127a
Summary:
Adapting error messages in Pulse so that they become more intuitive for
developers.
Reviewed By: jvillard
Differential Revision: D26887140
fbshipit-source-id: 896970ba2
Summary: This diff finds a declared variable name or declared field names from trace, then constructs an error message including access paths.
Reviewed By: jvillard
Differential Revision: D26544275
fbshipit-source-id: 135c90a1b
Summary:
`add_edge_on_src` is to prepare a stack location for a local variable. Before this diff, it was
called several times for each fields.
Reviewed By: jvillard
Differential Revision: D26543715
fbshipit-source-id: 49ebf2b65
Summary:
Having different behaviours inter-procedurally and intra-procedurally
sounds like a bad design in retrospect. The model of free() should not
depend on whether we currently know the value is not null as that means
some specs are missing from the summary.
Reviewed By: skcho
Differential Revision: D26019712
fbshipit-source-id: 1ac4316a5
Summary: This diff revises the trace generation of the uninitialized value checker, by introducing a new diagnostics for it.
Reviewed By: jvillard
Differential Revision: D25433775
fbshipit-source-id: 1279c0de4
Summary:
This diff supports inter-procedural uninit analysis in pulse.
* Added `MustBeInitialized` attribute to pre state when an address is read
* Remove `Uninitialized` attribute when callee has `WrittenTo` for the
same address
Reviewed By: jvillard
Differential Revision: D25368492
fbshipit-source-id: cbc74d4dc
Summary:
This diff adds uninitialized value check in pulse. For now, it supports only simple cases,
- declared variables with a type of integer, float, void, and pointer
- malloced pointer variables that points to integer, float, void, and pointer
TODOs: I will add more cases in the following diffs.
- declared/malloced array
- declared/malloced struct
- inter-procedural checking
Reviewed By: jvillard
Differential Revision: D25269073
fbshipit-source-id: 317df9a85
Summary: The new memory leaks analysis is now ready to be enabled by default and turned on in production. This also replaces the biabduction one which is now disabled.
Reviewed By: jvillard
Differential Revision: D21998666
fbshipit-source-id: 9cd95e894
Summary:
List of things happening in this unreviewable diff:
- moved PulsePathCondition to PulseSledge
- renamed --pulse-path-conditions to --pudge
- PulsePathCondition now contains all the arithmetic of pulse
(inferbo+concrete intervals+pudge). In particular, moved arithmetic
attributes into PulsePathCondition.t. PulsePathCondition plays the
role of PulseArithmetic (combining all domains).
- added tests for a false positive involving free()
- PulseArithmetic is now just a thin wrapper around PulsePathCondition
to operate on states directly (instead of on path conditions).
- The rest is mostly moving code into PulsePathCondition (eg, from
PulseInterproc) and adjusting it.
Reviewed By: jberdine
Differential Revision: D21332073
fbshipit-source-id: 184c8e0a9
Summary:
Add a new data structure and use it for the map of memory accesses to
limit the number of destinations reachable from a given address. This
avoids remembering details of each index in large arrays, or even each
field in large structs.
Reviewed By: skcho
Differential Revision: D18246091
fbshipit-source-id: 5d3974d9c
Summary:
When encountering a constant, pulse creates an abstract value (a
variable) to represent it, and remembers that it's equal to it. The
problem is that pulse doesn't yet know how to deal with the fact that
some variables are going to be equal to each other.
This hacks around this issue in the case of constants, within the same
procedure, by remembering which constants have been assigned to which
place-holder variables, and serving those variables again when the same
constant is translated again.
Limitation: this doesn't work across procedure calls as the "constant
maps" are not saved in summaries.
Something to look out for: we don't want to make `if (p == NULL)` create
a path where `p` is invalid (we only make null invalid when we see an
assignment from 0, i.e. `p = NULL;`).
Reviewed By: ezgicicek
Differential Revision: D21089961
fbshipit-source-id: 5ebb85d0a
Summary: When looking at some reports I realised that adding the place where the memory becomes unreachable to the trace makes it more readable.
Reviewed By: skcho
Differential Revision: D20790277
fbshipit-source-id: d5df69e68
Summary:
First version of a new memory leak check based on Pulse. The idea is to examine unreachable cells in the heap and check that the "Allocated" attribute is available but the "Invalid CFree" isn't. This is done when we remove variables from the state.
Currently it only works for malloc, we can extend it to other allocation functions later.
Reviewed By: jvillard
Differential Revision: D20444097
fbshipit-source-id: 33b6b25a2
Summary:
Adding a model for malloc: we add an attribute "Allocated". This can be used for implementing memory leaks: whenever the variables get out of scope, we can check that if the variable has an attribute Allocated, it also has an attribute Invalid CFree.
Possibly we will need more details in the Allocated attribute, to know if it's malloc, or other allocation function, but we can add that later when we know how it should look like.
Reviewed By: jvillard
Differential Revision: D20364541
fbshipit-source-id: 5e667a8c3