Summary:
Some code that is otherwise benignly scalar still uses the
ExtractElement and InsertElement vector operations, so translate them
as if they were array operations.
Reviewed By: ngorogiannis
Differential Revision: D17801949
fbshipit-source-id: 89f3666bd
Summary:
Previously, the LLVM semantics could be stuck where the LLAIR semantics
was not yet stuck, but would become stuck (at the same place) after
taking a step. This was due to LLVM using the traditional definition of
stuck states: any state from which there are no transitions. However,
LLAIR cannot do that because it might get stuck in the middle of a block
that contains several visible stores. We don't want to consider the
whole block stuck, nor can we finish it. Thus, the LLAIR definition of
stuckness is when the state has the stuck flag set which happens when
stopping in the middle of a block after encountering a stuck
instruction. Now LLVM takes the same approach.
Reviewed By: jberdine
Differential Revision: D17855085
fbshipit-source-id: a094d25d5
Summary:
Add an argument to the Exit instruction. Update the LLVM semantics to
execute the Exit instruction and store the result in an "exited"
component of the state. (Previously it just noticed that it was stuck
about to do an Exit.)
With exiting treated uniformly, now in the proof that for every LLVM
trace, there is a llair trace that simulates it, all of the cheats
except for 1 are just cases that I haven't got to yet. However, the last
cheat is for the situation where the LLVM program gets stuck and the
llair program doesn't. For example, the following two line LLVM program
gets stuck because r2 is not assigned (ignoring for the moment the static
restriction that LLVM is in SSA form).
r1 := r2
Exit(0)
The compilation to llair omits the assignment and so we get a llair
program that doesn't get stuck:
Exit(0)
The key question is whether the static restrictions are sufficient to
ensure that no expression that might be omitted can get stuck.
Reviewed By: jberdine
Differential Revision: D17737589
fbshipit-source-id: bc6c01a1b
Summary:
If the LLVM to llair translation keeps a mapping from register r to
expression e, then for each register r' mentioned in e, there must be an
assignment to r' that dominates the entire live range of r. Thus, where
ever r might be replaced by e, the value of r' will be the same as it
was when the initial assignment to r occurred. Maintaining this
invariant relies on the LLVM being in SSA form.
Reviewed By: jberdine
Differential Revision: D17710288
fbshipit-source-id: fd3eaa57d
Summary:
This is work in progress; many of the cheats aren't true. In particular,
the definition of stuck/complete/partial traces in LLVM and llair don't
quite match up and need some modification. Also, the state relation
isn't strong enough; it will need to include information about registers
used in the expressions of the LLVM register to llair expression
mapping. But the overall shape of the proof is ok and so it can be
used to poke at various local aspects of the translation, such as
individual instructions.
Reviewed By: jberdine
Differential Revision: D17631604
fbshipit-source-id: 743b5d64d
Summary:
By some unfortunate logic, OCaml often decides to use
`sexp_list`/`sexp_option` instead of just `list`/`option`. Sometimes
these get copy/pasted in interface files.
It would be good to tell OCaml not to do that in the first place but in
the meantime: this diff.
Reviewed By: ngorogiannis
Differential Revision: D17907938
fbshipit-source-id: 7546834a2
Summary:
For test scripting purposes, when the analysis finishes successfully,
report the number of alarms.
Reviewed By: ngorogiannis
Differential Revision: D17801947
fbshipit-source-id: 1660866df
Summary:
In a spec, it currently may be that foot.us does not contain xs. So
exec_specs needs to extend the vocabulary of foot before existentially
quantifying out xs.
Reviewed By: ngorogiannis
Differential Revision: D17801933
fbshipit-source-id: 7b4b9262a
Summary:
Previously it was added to the locals before calling Dom.call, but
this results in the scope of freturn ending too early.
Reviewed By: ngorogiannis
Differential Revision: D17801939
fbshipit-source-id: 739ec8981
Summary:
Some globals have 'appending' linkage, where linking modules results
in appending the arrays from each module. These can appear even when
empty, leading to useless and somewhat troublesome 0-length arrays. So
drop them.
Reviewed By: ngorogiannis
Differential Revision: D17801927
fbshipit-source-id: d2dc180d7
Summary:
While BitCasts are the identity function on the bitwise
representation, they are not necessarily so in the semantics or the
logical representation. So be more conservative about eliding them in
the Exp language. Those that are actually semantic identities are
still omitted in the Term language.
Reviewed By: ngorogiannis
Differential Revision: D17801950
fbshipit-source-id: bf9ae57b5
Summary:
The analyzer (currently) hard-codes some assumptions about sizes of
basic types such as Typ.bool, Typ.siz, etc. Check that these
assumptions are satisfied by the input llvm datalayout, and give
reasonable error messages otherwise.
Reviewed By: ngorogiannis
Differential Revision: D17801941
fbshipit-source-id: 4fe484ee0
Summary:
Now that expression types and type sizes can be computed, it is not
necessary to store the sizes of globals separately.
Reviewed By: ngorogiannis
Differential Revision: D17801932
fbshipit-source-id: f746e506b
Summary:
- The `Llvm_target.DataLayout.size_in_bits` needs to be used for checking casts
e.g. it is ok to `bitcast <16 x i1> to i16`: they both have 16 bits, but they have sizes 16 vs 2 bytes
- The `Llvm_target.DataLayout.abi_size` needs to be used for the size of memory blocks containing values
e.g. for the size of memory segments containing the initial values of globals
- The example above shows that we can't compute the byte size from the bit size without knowing the target specific datalayout
- So we need both in each sized type
- Also add checks that Convert exps and terms are not no-ops
- Simplifications of size manipulating code
Reviewed By: ngorogiannis
Differential Revision: D17801928
fbshipit-source-id: 8c8ce6128
Summary:
In order to type-check casts, it is necessary to have the size of each
sized type. This size information is also useful in a few other places.
Reviewed By: bennostein
Differential Revision: D17801931
fbshipit-source-id: f8ef53276
Summary:
This is needed since expressions distinguish between the integer or
pointer zero value and zero-initialized array/tuple/struct aggregates
based on type, and the backend distinguishes them semantically.
Reviewed By: bennostein
Differential Revision: D17801938
fbshipit-source-id: ac8665e65
Summary:
Linking can lead to opaque types becoming identified with a known
types. Assertions in various places that types should be sized can be
triggered by such opaque types. Until there is a distinction between
processing fully-linked versus incomplete code, these checks need to
be relaxed to permit opaque types where sized ones are expected.
Reviewed By: bennostein
Differential Revision: D17801929
fbshipit-source-id: c5e62f7c8
Summary: Integer terms need to compare higher than any monomial.
Reviewed By: bennostein
Differential Revision: D17725607
fbshipit-source-id: c64fd52d5
Summary:
Also weaken definition of Typ.castable to permit casting between
floats and ints of the same size.
Reviewed By: bennostein
Differential Revision: D17725611
fbshipit-source-id: 5e8114e26
Summary:
Typ.equivalent is currently defined the same as Typ.castable, but
conceptually they are different and castable needs to be
weakened. They are different since for example it is possible to cast
from an i64 to a f64, but those types denote different sets of values
in the semantics, and the bitcast is modeled using a conversion
function.
Reviewed By: bennostein
Differential Revision: D17725615
fbshipit-source-id: 973574f2a
Summary:
For function calls where the callee is a cast expression, previous the
wrong type would be used for the callee. This could lead to crashes in
llvm, or asserting in sledge.
Reviewed By: bennostein
Differential Revision: D17725610
fbshipit-source-id: 938b49a49
Summary:
Some called functions are represented in llvm as a global variable
with e.g. external linkage, and so they do not appear as
'functions'. It is still valid to call such functions, though the
analyzer does not know their definitions.
Reviewed By: bennostein
Differential Revision: D17725609
fbshipit-source-id: 333d19c0d
Summary:
Improve Trace.fail to log the error and raise informative exceptions.
Eliminate the confusion between Import.fail and Trace.fail by removing
Import.fail.
Reviewed By: bennostein
Differential Revision: D17725608
fbshipit-source-id: 79fdfbd86
Summary:
By default all functions except those specified as entry points in the
config file are "internalized". Internal functions are removed if they
are not called. It is sometimes necessary to disable internalization,
e.g. to analyze the llvm tests.
Reviewed By: bennostein
Differential Revision: D17725614
fbshipit-source-id: 4b13501f5
Summary:
Sometimes the models for the C/C++ runtime and standard libraries are
not needed. Furthermore, sometimes, e.g. when analyzing llvm tests,
trying to link them fails.
Reviewed By: bennostein
Differential Revision: D17725616
fbshipit-source-id: 76a4bcf90
Summary:
The `(t, unit) result` type is no more informative than `t option` and
less convenient.
Reviewed By: bennostein
Differential Revision: D17665244
fbshipit-source-id: fa969d8b7
Summary:
This puts the mediation between Exp and Term together in Sh_domain
rather than being spread across the two.
Reviewed By: bennostein
Differential Revision: D17665235
fbshipit-source-id: edf277d45
Summary:
The move instruction takes a vector of assignments to perform in
parallel, so generalize exec_move from one to a vector.
Reviewed By: bennostein
Differential Revision: D17665248
fbshipit-source-id: 52aae5ff9
Summary:
Extend the encoding using `id` from 0 indicating a program variable to
also -1 indicating a global program variable.
Reviewed By: bennostein
Differential Revision: D17665229
fbshipit-source-id: 848b8a31e
Summary:
The sorting of heap blocks when printing formulas was broken by the
change to the direct representation of polynomials.
Reviewed By: bennostein
Differential Revision: D17665246
fbshipit-source-id: 4ebea9f20
Summary: It is not necessary to have both < and >=, and similarly for <= and >.
Reviewed By: bennostein
Differential Revision: D17665232
fbshipit-source-id: 01b3511f5
Summary:
Now that terms operate over unbounded, signed, integers rather than
bounded integers, and Boolean operations are treated uniformly with
bitwise operations, it is not necessary to propagate types throughout
arithmetic term manipulation.
Reviewed By: bennostein
Differential Revision: D17665257
fbshipit-source-id: 5236b101d
Summary:
Z.numbits ignores the sign, which allows 2^(N - 1) as representable
within N bits, while it is not. So check explicitly.
Reviewed By: bennostein
Differential Revision: D17665231
fbshipit-source-id: 0d3940517
Summary:
Instead of having separate signed and unsigned operations, use the
signed operations applied to explicit conversion of the arguments
using an unsigned integer interpretation.
Reviewed By: bennostein
Differential Revision: D17665267
fbshipit-source-id: 0b3271e71
Summary:
Add an Extract term form to interpret an integer with given signedness
and bitwidth.
Reviewed By: bennostein
Differential Revision: D17665263
fbshipit-source-id: 1d8917f3c
Summary:
Be more explicit about semantics of unsigned vs. signed conversions,
and fix a few related corner cases.
Reviewed By: bennostein
Differential Revision: D17665268
fbshipit-source-id: 67fecdf34
Summary:
With terms using unbounded two's complement arithmetic, it is not
necessary to special-case 1-bit integers as Booleans.
Reviewed By: ngorogiannis
Differential Revision: D17665228
fbshipit-source-id: a2f280fc3
Summary:
Remove the guards that prevent normalizing in some cases where the
corresponding instruction in LLVM would produce a poison
value. Usefully tracking poison values will be more involved.
Reviewed By: ngorogiannis
Differential Revision: D17665230
fbshipit-source-id: 59fb25042
Summary:
Revise program expressions based on the changed constraints now that
Term is separate from Exp. In particular:
- Add types to all application, indicating how the operation
interprets its arguments
- Change to a simpler uncurried form
- Remove now-unneeded normalizations
Reviewed By: bennostein
Differential Revision: D17665236
fbshipit-source-id: 1bcf2efd6
Summary:
Boolean and bitwise negation of `e` is represented using `-1 xor
e`. Since Equality can only maintain and propagate equality
constraints, Boolean negation `-1 xor b` is normalized to `b =
false`. This diff delays this normalization from being part of
expression construction to part of symbolic heap formula
construction. This makes the normalization done as part of expression
construction independent of the distinction between bitwise and
boolean operations.
Reviewed By: bennostein
Differential Revision: D17665254
fbshipit-source-id: 0a0722865
Summary:
Splat, Memory, and Concat expressions are never used. Only the term
forms are needed.
Reviewed By: bennostein
Differential Revision: D17665259
fbshipit-source-id: cbfd7650d
Summary:
There are a number if issues with using the same type for expressions
in code and in formulas. One is that the type systems of the two
should be different. Another is that conflating the two compromises
the ability of Llair to correctly express aspects such as integer
overflow, floating point rounding, etc. Also, it could be beneficial
to have more source locations for program expressions than makes sense
for terms.
This diff simply unshares Exp, leading to a copy named Term. Likewise,
Reg is now a copy of Var. Simplifications to come.
Reviewed By: bennostein
Differential Revision: D17665250
fbshipit-source-id: 4359a80d5
Summary:
The generation of names for the function formal return and throw
parameters is not central to LLAIR, but a detail of the frontend,
since they are generated only because LLVM does not already have such
names.
Reviewed By: ngorogiannis
Differential Revision: D17665240
fbshipit-source-id: 684cbae92
Summary:
Using a type of keys richer than strings, which are the unique symbol
names at the C/LLVM level, is unnecessary.
Reviewed By: ngorogiannis
Differential Revision: D17665262
fbshipit-source-id: 6b8c31146
Summary:
The convenience wrappers for operations on signed 1-bit integers
represented by Z.t are not specific to Exp.
Reviewed By: ngorogiannis
Differential Revision: D17665252
fbshipit-source-id: d4b58e2a6
Summary:
Now that the relation domain construction is factored out and
generalized.
Reviewed By: ngorogiannis
Differential Revision: D17665253
fbshipit-source-id: eb156ce6b
Summary:
Since version 2, none of the `opam pin` modes work reasonably well for
the standard llvm build procedure. As a workaround to prevent opam
from making several copies of the build directory when pinning, adjust
to move the llvm build and install directories out of the llvm source
tree.
Reviewed By: bennostein
Differential Revision: D17665242
fbshipit-source-id: ac84a4b0b
Summary:
Since the correcteness of the mapping from LLVM to llair depends on
LLVM being SSA, we need to formalise what that means. We also prove that
the domination relation is a strict partial order, which will probably
be helpful when reasoning about the translation.
Reviewed By: jberdine
Differential Revision: D17631456
fbshipit-source-id: a00eb3f87
Summary:
The LLVM semantics and translation was not consistently treating the
1-bit word value condition as signed or unsigned.
Reviewed By: jberdine
Differential Revision: D17605766
fbshipit-source-id: 77edf63b7
Summary:
Previously the LLVM semantics did the phi instructions at the head of a
block as part of executing the branch into that block. This looked a bit
weird, but had the advantage that the semantics knew which block was
being jumped from, which is necessary to run the phi instructions.
However, it meant that the rules for doing phi instructions would need
to show up with each branching construct. It was also annoying for the
LLVM->llair proof, since the phis are removed and their effect happens as
a distinct step from the branch.
Here we add a distinct Phi_ip instruction pointer to indicate that the
phi instructions at the start of the block should execute next, and then
be incremented to the usual numeric instruction pointer that points to
the non-phi instructions. The Phi_ip contains the identity of the
previous block.
Reviewed By: jberdine
Differential Revision: D17452416
fbshipit-source-id: 78fef7cca
Summary:
Give the llair semantics observable side effects (writes to global
variables) and a semantic function mirroring the LLVM semantics. Start
sketching out the LLVM/llair translation equivalence proof in a top-down
way from the obvious statement of equality of the semantics.
Reviewed By: jberdine
Differential Revision: D17399654
fbshipit-source-id: 2170678a8
Summary:
The simple LLVM semantics steps one instruction at a time, but the
generated llair does whole blocks at a time, since many individual LLVM
instructions can become a single llair expression. We add a bigger-step
LLVM semantics that does whole blocks at a time (except that it also
stops at function calls, since those end blocks in llair). The steps in
this bigger-step semantics should be at the same granularity as the
llair steps, making it easier to verify the translation.
We add a notion of observation to the LLVM semantics (right now, just
global variable writes) and use that to define two top-level semantic
functions, which we prove to be equivalent.
Reviewed By: jberdine
Differential Revision: D17396016
fbshipit-source-id: ee632fb92
Summary:
This diff allows domains to specify which abstract states can or can't
be merged together by the worklist. In particular, this is needed for
relational domains to ensure that Hoare triples are joined only when
they share a precondition.
Reviewed By: jberdine
Differential Revision: D17571148
fbshipit-source-id: d9345fdc9
Summary:
This diff adds a "-prenalyze-globals" flag to all analyze targets
which, when set, computes used-globals sets for all reachable
functions and then uses that information to track only relevant
global variables at calls in the main analysis.
Reviewed By: jberdine, jvillard
Differential Revision: D17526746
fbshipit-source-id: 1a114285c
Summary:
Fixes a bug in Llair.Frontend.xlate_value where the l-val register
of LLVM instruction calls was being marked as global.
Reviewed By: jberdine
Differential Revision: D17570458
fbshipit-source-id: e1b5924e2
Summary:
Fixes a bug where are all calls are treated as intrinsics in used
globals analysis, since exec_intrinsic is invoked at _all_ calls
to determine which are intrinsic, not only at call sites known to
target intrinsics.
Reviewed By: jberdine
Differential Revision: D17499406
fbshipit-source-id: 41f7621f2
Summary:
While the symbolic heap analysis ends its search upon hitting the
bound on recursion depth, the used-globals analysis should instead
simply skip recursive calls beyond the depth. Note that this is
unsound for arbitrary abstract domains, however, and the flag
controlling this feature should be used with caution.
Note that procedure calls are still not handled correctly, since
Used_globals.exec_intrinsic does not properly check whether callees
are intrinsic. A forthcoming commit will fix that, as well.
Reviewed By: jberdine
Differential Revision: D17479753
fbshipit-source-id: aa92e0ef3
Summary:
Include global variables used in function callees in used globals
analysis. Also adds support for arbitrary changes to symbolic
state while resolving callees in other analyses.
Reviewed By: jberdine
Differential Revision: D17479352
fbshipit-source-id: e3cd9f179
Summary:
Replace custom version reporting support using a shell script with
code using dune's Build_info API.
Note that after this diff, the executables under _build/<context> are
not version-stamped, but those under _build/_install are. The symlinks
in bin point to the latter, stamped, exes.
Reviewed By: bennostein
Differential Revision: D16985446
fbshipit-source-id: 7afac87be
Summary:
Adds an abstract domain to track global variable usages, as well as supporting
changes to the frontend, IR and CLI. This analysis will support optimizations
to the main symbolic-heap analysis, but for now can be invoked independently
through the `-domain` flag on `analyze` targets of the Sledge executable.
Reviewed By: jberdine
Differential Revision: D17422212
fbshipit-source-id: 74bed0a76
Summary:
Generalize the lifting from State_domain (i.e. symbolic heaps) to Sh_domain (i.e. relations over symbolic heaps).
Also, extract abstract-domain-related code into its own module/directory.
Reviewed By: jberdine
Differential Revision: D17319007
fbshipit-source-id: cefbd1393
Summary: Add support for future development of new abstract domains by eliminating hard-wired dependencies from the worklist into the symbolic heap domain. Also includes an implementation of a trivial unit domain and a CLI flag to enable its use, for debugging purposes.
Reviewed By: jberdine
Differential Revision: D17281681
fbshipit-source-id: 5858fd420
Summary:
This includes a few changes and corrections to the semantics, to support
the translation. This initial attempt to reason about LLVM -> llair
showed three things that needed repair in the semantics, in addition to
various bugs. We address them as follows.
Refactor llair semantics to have only a single kind of flat value:
integers that fit into specified bit widths. Operations on size values
(e.g., offsets, indices and the like) can just take an integer and
ignore its number of bits. Pointers can just be considered integers that
fit into a certain size given by the constant pointer_size. Later on we
can consider making this a parameter to the model.
Change the generic memory model interface to use numbers rather than
words as the generic encoding of a large value. This makes it more
useful for llair where words are not used.
Pay more careful attention to signed/unsigned issues. Neither LLVM nor
llair have a concept of signed vs unsigned value. Instead individual
operations interpret bit patterns in various ways, some of which are
ambiguous in the LLVM manual. For example, since getelementpointer's
indices are explicitly said to be interpreted as signed 2's complement,
we should probably do the same for insertvalue and extractvalue. However
it is not clear how the argument to alloca is to be interpreted. For now
we assume signed.
Reviewed By: jberdine
Differential Revision: D17164133
fbshipit-source-id: 31a8af635
Summary:
Not everything is here yet, and there is some confusion on what to do
about the size values. However, the semantics has the right general
shape and will be a nice starting point for thinking about the details.
Reviewed By: jberdine
Differential Revision: D17111041
fbshipit-source-id: cc75651c6
Summary:
The translation from LLVM to llair now builds expressions up across
blocks, following the implementation. This is easy to do because of the
dominance restrictions in SSA, but might be difficult to reason
about.
Reviewed By: jberdine
Differential Revision: D17111040
fbshipit-source-id: a8e99147d
Summary:
LLVM and llair have similar memory models, and we don't want to
duplicate any definitions or theorems. This adds a new memory model
theory which should be understandable in its own right. A heap is a
mapping from addresses to bytes, alongside a set of valid addresses, and
intervals that have been allocated already. Primitives are defined for
allocating and de-allocating as well as reading and writing chuncks of
bytes.
There is also a generic type of structured values, and functions for
converting them to/from byte arrays.
Reviewed By: jberdine
Differential Revision: D17074470
fbshipit-source-id: bdab6089f
Summary:
In some cases inlining pure expressions into their use sites causes
code blowup. This diff changes the frontend to inline expressions only
if there is a single use, and otherwise adds a move instruction.
Reviewed By: ngorogiannis
Differential Revision: D17071770
fbshipit-source-id: d866a0622