Summary:
There are a number if issues with using the same type for expressions
in code and in formulas. One is that the type systems of the two
should be different. Another is that conflating the two compromises
the ability of Llair to correctly express aspects such as integer
overflow, floating point rounding, etc. Also, it could be beneficial
to have more source locations for program expressions than makes sense
for terms.
This diff simply unshares Exp, leading to a copy named Term. Likewise,
Reg is now a copy of Var. Simplifications to come.
Reviewed By: bennostein
Differential Revision: D17665250
fbshipit-source-id: 4359a80d5
Summary:
The generation of names for the function formal return and throw
parameters is not central to LLAIR, but a detail of the frontend,
since they are generated only because LLVM does not already have such
names.
Reviewed By: ngorogiannis
Differential Revision: D17665240
fbshipit-source-id: 684cbae92
Summary:
Using a type of keys richer than strings, which are the unique symbol
names at the C/LLVM level, is unnecessary.
Reviewed By: ngorogiannis
Differential Revision: D17665262
fbshipit-source-id: 6b8c31146
Summary:
The convenience wrappers for operations on signed 1-bit integers
represented by Z.t are not specific to Exp.
Reviewed By: ngorogiannis
Differential Revision: D17665252
fbshipit-source-id: d4b58e2a6
Summary:
Now that the relation domain construction is factored out and
generalized.
Reviewed By: ngorogiannis
Differential Revision: D17665253
fbshipit-source-id: eb156ce6b
Summary:
Since version 2, none of the `opam pin` modes work reasonably well for
the standard llvm build procedure. As a workaround to prevent opam
from making several copies of the build directory when pinning, adjust
to move the llvm build and install directories out of the llvm source
tree.
Reviewed By: bennostein
Differential Revision: D17665242
fbshipit-source-id: ac84a4b0b
Summary:
Since the correcteness of the mapping from LLVM to llair depends on
LLVM being SSA, we need to formalise what that means. We also prove that
the domination relation is a strict partial order, which will probably
be helpful when reasoning about the translation.
Reviewed By: jberdine
Differential Revision: D17631456
fbshipit-source-id: a00eb3f87
Summary:
The LLVM semantics and translation was not consistently treating the
1-bit word value condition as signed or unsigned.
Reviewed By: jberdine
Differential Revision: D17605766
fbshipit-source-id: 77edf63b7
Summary:
Previously the LLVM semantics did the phi instructions at the head of a
block as part of executing the branch into that block. This looked a bit
weird, but had the advantage that the semantics knew which block was
being jumped from, which is necessary to run the phi instructions.
However, it meant that the rules for doing phi instructions would need
to show up with each branching construct. It was also annoying for the
LLVM->llair proof, since the phis are removed and their effect happens as
a distinct step from the branch.
Here we add a distinct Phi_ip instruction pointer to indicate that the
phi instructions at the start of the block should execute next, and then
be incremented to the usual numeric instruction pointer that points to
the non-phi instructions. The Phi_ip contains the identity of the
previous block.
Reviewed By: jberdine
Differential Revision: D17452416
fbshipit-source-id: 78fef7cca
Summary:
Give the llair semantics observable side effects (writes to global
variables) and a semantic function mirroring the LLVM semantics. Start
sketching out the LLVM/llair translation equivalence proof in a top-down
way from the obvious statement of equality of the semantics.
Reviewed By: jberdine
Differential Revision: D17399654
fbshipit-source-id: 2170678a8
Summary:
The simple LLVM semantics steps one instruction at a time, but the
generated llair does whole blocks at a time, since many individual LLVM
instructions can become a single llair expression. We add a bigger-step
LLVM semantics that does whole blocks at a time (except that it also
stops at function calls, since those end blocks in llair). The steps in
this bigger-step semantics should be at the same granularity as the
llair steps, making it easier to verify the translation.
We add a notion of observation to the LLVM semantics (right now, just
global variable writes) and use that to define two top-level semantic
functions, which we prove to be equivalent.
Reviewed By: jberdine
Differential Revision: D17396016
fbshipit-source-id: ee632fb92
Summary:
This diff allows domains to specify which abstract states can or can't
be merged together by the worklist. In particular, this is needed for
relational domains to ensure that Hoare triples are joined only when
they share a precondition.
Reviewed By: jberdine
Differential Revision: D17571148
fbshipit-source-id: d9345fdc9
Summary:
This diff adds a "-prenalyze-globals" flag to all analyze targets
which, when set, computes used-globals sets for all reachable
functions and then uses that information to track only relevant
global variables at calls in the main analysis.
Reviewed By: jberdine, jvillard
Differential Revision: D17526746
fbshipit-source-id: 1a114285c
Summary:
Fixes a bug in Llair.Frontend.xlate_value where the l-val register
of LLVM instruction calls was being marked as global.
Reviewed By: jberdine
Differential Revision: D17570458
fbshipit-source-id: e1b5924e2
Summary:
Fixes a bug where are all calls are treated as intrinsics in used
globals analysis, since exec_intrinsic is invoked at _all_ calls
to determine which are intrinsic, not only at call sites known to
target intrinsics.
Reviewed By: jberdine
Differential Revision: D17499406
fbshipit-source-id: 41f7621f2
Summary:
While the symbolic heap analysis ends its search upon hitting the
bound on recursion depth, the used-globals analysis should instead
simply skip recursive calls beyond the depth. Note that this is
unsound for arbitrary abstract domains, however, and the flag
controlling this feature should be used with caution.
Note that procedure calls are still not handled correctly, since
Used_globals.exec_intrinsic does not properly check whether callees
are intrinsic. A forthcoming commit will fix that, as well.
Reviewed By: jberdine
Differential Revision: D17479753
fbshipit-source-id: aa92e0ef3
Summary:
Include global variables used in function callees in used globals
analysis. Also adds support for arbitrary changes to symbolic
state while resolving callees in other analyses.
Reviewed By: jberdine
Differential Revision: D17479352
fbshipit-source-id: e3cd9f179
Summary:
Replace custom version reporting support using a shell script with
code using dune's Build_info API.
Note that after this diff, the executables under _build/<context> are
not version-stamped, but those under _build/_install are. The symlinks
in bin point to the latter, stamped, exes.
Reviewed By: bennostein
Differential Revision: D16985446
fbshipit-source-id: 7afac87be
Summary:
Adds an abstract domain to track global variable usages, as well as supporting
changes to the frontend, IR and CLI. This analysis will support optimizations
to the main symbolic-heap analysis, but for now can be invoked independently
through the `-domain` flag on `analyze` targets of the Sledge executable.
Reviewed By: jberdine
Differential Revision: D17422212
fbshipit-source-id: 74bed0a76
Summary:
Generalize the lifting from State_domain (i.e. symbolic heaps) to Sh_domain (i.e. relations over symbolic heaps).
Also, extract abstract-domain-related code into its own module/directory.
Reviewed By: jberdine
Differential Revision: D17319007
fbshipit-source-id: cefbd1393
Summary: Add support for future development of new abstract domains by eliminating hard-wired dependencies from the worklist into the symbolic heap domain. Also includes an implementation of a trivial unit domain and a CLI flag to enable its use, for debugging purposes.
Reviewed By: jberdine
Differential Revision: D17281681
fbshipit-source-id: 5858fd420
Summary:
This includes a few changes and corrections to the semantics, to support
the translation. This initial attempt to reason about LLVM -> llair
showed three things that needed repair in the semantics, in addition to
various bugs. We address them as follows.
Refactor llair semantics to have only a single kind of flat value:
integers that fit into specified bit widths. Operations on size values
(e.g., offsets, indices and the like) can just take an integer and
ignore its number of bits. Pointers can just be considered integers that
fit into a certain size given by the constant pointer_size. Later on we
can consider making this a parameter to the model.
Change the generic memory model interface to use numbers rather than
words as the generic encoding of a large value. This makes it more
useful for llair where words are not used.
Pay more careful attention to signed/unsigned issues. Neither LLVM nor
llair have a concept of signed vs unsigned value. Instead individual
operations interpret bit patterns in various ways, some of which are
ambiguous in the LLVM manual. For example, since getelementpointer's
indices are explicitly said to be interpreted as signed 2's complement,
we should probably do the same for insertvalue and extractvalue. However
it is not clear how the argument to alloca is to be interpreted. For now
we assume signed.
Reviewed By: jberdine
Differential Revision: D17164133
fbshipit-source-id: 31a8af635
Summary:
Not everything is here yet, and there is some confusion on what to do
about the size values. However, the semantics has the right general
shape and will be a nice starting point for thinking about the details.
Reviewed By: jberdine
Differential Revision: D17111041
fbshipit-source-id: cc75651c6
Summary:
The translation from LLVM to llair now builds expressions up across
blocks, following the implementation. This is easy to do because of the
dominance restrictions in SSA, but might be difficult to reason
about.
Reviewed By: jberdine
Differential Revision: D17111040
fbshipit-source-id: a8e99147d
Summary:
LLVM and llair have similar memory models, and we don't want to
duplicate any definitions or theorems. This adds a new memory model
theory which should be understandable in its own right. A heap is a
mapping from addresses to bytes, alongside a set of valid addresses, and
intervals that have been allocated already. Primitives are defined for
allocating and de-allocating as well as reading and writing chuncks of
bytes.
There is also a generic type of structured values, and functions for
converting them to/from byte arrays.
Reviewed By: jberdine
Differential Revision: D17074470
fbshipit-source-id: bdab6089f
Summary:
In some cases inlining pure expressions into their use sites causes
code blowup. This diff changes the frontend to inline expressions only
if there is a single use, and otherwise adds a move instruction.
Reviewed By: ngorogiannis
Differential Revision: D17071770
fbshipit-source-id: d866a0622
Summary:
This has been out of date since arithmetic was changed from a purely
uninterpreted treatment to having a solver.
Reviewed By: jvillard
Differential Revision: D16985159
fbshipit-source-id: 39e42069c
Summary:
While SSA can be useful for code transformation purposes, it offers
little for semantic static analyses. Essentially, such analyses
explore the dynamic semantics of code, and the *static* single
assignment property does not buy much. For example, once an execution
visits a loop body that assigns a variable, there are multiple
assignments that the analysis must deal with. This leads to the need
to treat blocks as if they assign all their local variables, renaming
to avoid name clashes a la Floyd's assignment axiom. That is fine, but
it makes it much more involved to implement a version that is
economical with respect to renaming only when necessary. Additionally
the scoping constraints of SSA are cumbersome and significantly
complicate interprocedural analysis (where there is a long history of
incorrect proof rules for procedures, and SSA pushes the
interprocedural analysis away from being able to use known-good
ones). So this diff changes Llair from a functional SSA form to a
traditional imperative language.
Reviewed By: jvillard
Differential Revision: D16905898
fbshipit-source-id: 0fd835220
Summary:
Before this diff symbolic execution of instructions assumed that
assigned variables were unconstrained in the precondition. This is
ensured by symbolic execution of control flow, which renames all local
variables of a block when it is entered.
This diff changes symbolic execution of instructions to rename
modified variables that appear in the precondition when necessary, and
accounts for the modified variable occurrence condition on the frame
rule. This will enable more economically renaming variables, as most
of the time it is not needed.
Reviewed By: jvillard
Differential Revision: D16905893
fbshipit-source-id: 3a53525d7
Summary:
Each variable now contains its type, alongside its name. This is more
uniform than in LLVM, where the name is usually paired with a type, but
not always, for example, the register type of the result of an
extractvalue is left implicit.
Reviewed By: jberdine
Differential Revision: D16984630
fbshipit-source-id: 1c3bc4985
Summary:
HOL now lets us omit quotations on Datatypes and make them look more
like the other new-style HOL definitions.
Reviewed By: jberdine
Differential Revision: D16983934
fbshipit-source-id: f8ef3abb5
Summary:
This sketches out how translation can be approached. It is partially
based on the Sledge code.
For basic blocks, isn't based on the Sledge code, but just my own
thoughts as a starting point. Essentially, we are trying to build up
larger expressions, and so not assigning to temporary registers that
don't live past the end of the block. This does remove sharing, so a
fancier approach could check for multiple uses of end-of-block dead
registers, or look at the sizes of expressions. The approach should be
flexible enough to accommodate such changes.
Fix icmp syntax
Using finite maps is elegant in the semantics, but awkward for writing
the translation function. Refactor the mappings from labels to functions
and from labels to blocks to use association lists instead.
To remove phi nodes, the translation takes every edge in the control
flow graph and makes a new basic block that contains a single parallel
move instruction that corresponds to the action of the phi node of the
target block.
Reviewed By: jberdine
Differential Revision: D16831051
fbshipit-source-id: 005663e26
Summary:
The AST is not complete on expressions, but it should have most of the
important features.
The representation is in some ways very different from the OCaml
implementation, because the OCaml code uses mutation to build the CFG as
an actual pointer graph in memory, and also because the expression
representation is optimised for the backend. For the former, it should
be easy to see that the AST here is isomorphic, representing the CFG
with finite maps from block labels. The correspondence is less clear in
the latter case, but the point here is not to model or verify
implementation optimisations, but to give a semantics to llair as a
language.
Reviewed By: jberdine
Differential Revision: D16807132
fbshipit-source-id: b0f64b3ec
Summary:
Change loc_var (for local variable) to reg (for register) because
loc_var looks too much like a location tagged variable.
Reviewed By: jberdine
Differential Revision: D16827920
fbshipit-source-id: 5b11f1065
Summary:
There could very well still be bugs in the semantics, since the
invariant here doesn't say all that much, and it completely ignores
local registers. But most trivial things and typos are probably fixed.
Reviewed By: jberdine
Differential Revision: D16803281
fbshipit-source-id: 48ba2523b
Summary:
Made progress on the sanity checking lemma (that the step relation
preserves some simple invariants on the state). Proved the Ret
instruction case of the state invariant lemma. To do this, I fixed a few
bugs in the definition, and strengthened the invariants.
Reviewed By: jberdine
Differential Revision: D16786900
fbshipit-source-id: 6fa8cb170
Summary:
Global variables need allocating and initialising before the machine can
start. The definition here shouldn't constrain how and where they are
allocated. For example, they don't all need to have separate
allocations. We also tag allocated blocks so that the allocation for a
global can never be deallocated.
Start working on a sanity checking invariant on states.
Reviewed By: jberdine
Differential Revision: D16735068
fbshipit-source-id: 0d5e60e7a
Summary:
Start working on a simple model of LLVM with the ultimate goal of
handling relevant and/or tricky aspects of LLVM and LLAIR and then
formalising the translation from LLVM to LLAIR.
This is a complete initial model of everything that we are interested in
except for exceptions, which should be tricky. Also no thought has gone
into the treatment of poison and the undefined value, so the treatment
is naive, which is at least partially justified because we are
interested in the semantics of LLVM IR after the optimisation passes
have run.
Include some sanity checking theorems.
Reviewed By: jberdine
Differential Revision: D16731885
fbshipit-source-id: fd53949fe
Summary:
Currently bitcode produced with `sledge buck link` can have missing
symbols that are clearly defined in the source. For example consider a
symbol `awesome_function` that is defined in the libraries linked in but
not in the produced binary (despite being reachable from main).
`llvm-nm` of the bitcode produced by `llvm-link` might look like:
```
U awesome_function
t awesome_function.1892
```
Some our `awesome_function` is undefined and its definition is called
`awsome_function.1892` for some reason and is local. I think this is because symbol get internalized too early and then they get renamed and somehow lost. Not sure why `llvm-link` behaves this way sometimes.
This patch removes internalization from `llvm-link` and puts it into `opt`, where it doesn't cause problems.
Reviewed By: jvillard
Differential Revision: D16494153
fbshipit-source-id: aad9053a4
Summary:
`__llair_alloc` is meant to be a drop-in non-failing replacement for
`mallco`. Currently `__llair_alloc(1)` allocates 8 bytes instead of 1 as
`malloc(1)` would. This is because handling of `__llair_alloc` was
merged with handling of `new`. This patch reverts changes to handling of
`new` in D15778817 and adds a new case for `__llair_alloc`.
Reviewed By: jvillard
Differential Revision: D16356865
fbshipit-source-id: 3878d87c3
Summary:
When using summaries we first garbage collect the precondition and then
ask the solver to infer the frame of the precondition with respect to
grabage-collected footprint.
Currently if the solver fails to show the frame, we just give it an
empty frame. This is bad, because if grabage collection removed some
segments, they don't get added back on.
This patch throws an exception instead to be very explicit when the
solver cannot show the frame in this case.
Reviewed By: ngorogiannis
Differential Revision: D16339587
fbshipit-source-id: b88d0689c
Summary:
The actual implementation of folly::usingJEMalloc() tests if malloc is
jemalloc using internal knowledge of the jemalloc implemenation of
malloc. This internal behavior is not reflected in the analyzer's
spec, so the detection fails.
Additionally, folly::usingJEMalloc is implemented using mallctl to
query internal state of jemalloc. Depending on the key string passed
to mallctl, it might return a pointer to jemalloc internal state, or a
scalar, which means that the spec needs to essentially allocate that
state in those cases.
Since the jemalloc detection fails, and the analyzer is not always
able to reason precisely about string equality, this diff models
folly::usingJEMalloc directly (as nondet).
Reviewed By: kren1
Differential Revision: D16059776
fbshipit-source-id: 7e7156d7d
Summary:
It seems that functions internalized by llvm no longer have valid
mangled names, and instead have a `.<int>` suffix. This diff removes
these unpredictable suffixes when checking if a called function is a
specified/modeled intrinsic.
Reviewed By: kren1
Differential Revision: D16059781
fbshipit-source-id: a4b9f6c73
Summary:
A frame inference query `Minuend ⊢ ∃xs. Subtrahend` returns a
`∃zs. Remainder` formula such that `Minuend ⊢ ∃xs. Subtrahend *
∃zs. Remainder` when successful. Currently if the subtrahend is itself
existentially quantified, its existentials are treated trivially: they
must witness themselves. This diff allows the solver to find witnesses
as the `xs`. They are still existentially quantified in the remainder,
so clients that need to constrain them should still name them before
calling the solver.
Reviewed By: kren1
Differential Revision: D16269630
fbshipit-source-id: 65136edd1
Summary:
Add a global merge pass that merges globals into a single big global. It
replaces the uses of globals merged, with offsets into the big global.
Function summarisationis a big benefactor of this as it greatly reduces
the number of implicit formals (ie. globals).
Reviewed By: jvillard
Differential Revision: D16260098
fbshipit-source-id: 1b936f02f
Summary:
Fix a bug where summaries would be created even if summarisation option
is disabled.
Reviewed By: jvillard
Differential Revision: D16259761
fbshipit-source-id: f7319ef03
Summary:
If function summaries are enabled calling a function first tries to
apply a summary, if succesful, it directly jumps to the return site of
the call. Otherwise it proceeds as before.
Reviewed By: jvillard
Differential Revision: D16201251
fbshipit-source-id: cec52e0e5
Summary:
Define a new function summary type and compute it on function return.
As an intermediary step also apply the just computed summary to function
pre so it can be compared to what was actually computed.
Reviewed By: jvillard
Differential Revision: D16149833
fbshipit-source-id: b826c17e8
Summary:
Fix a crash that occurs when subtrahend has an existential variable that
was renamed as in the test.
The crash is due to an assertion in `Sh.exists` that says only variables
in the vocabulary can be existentialy quantifed out.
The problem was `Sh.exists` call in Solver.ml:611. Where `ws`
(existentials of the subthrehend) are not present in the vocabulary of
the remainder. This is because remainder "inheirts" the vocabulary of
the minued.
This fix simply extends the vocabulary of minued with `ws`, which
means the remaainder has the correct vocabulary. This should have no
externally visible effect as `ws` are then existentialed out.
Another option would be to try to change all the `excise_seg` functions,
to keep the vocabulary, but that looked annoying to implement.
Reviewed By: jvillard
Differential Revision: D16201423
fbshipit-source-id: b88c3abc4
Summary:
Add a `-color` option to sledge, that prints variable that are
existentially bound as bold.
Reviewed By: ngorogiannis
Differential Revision: D16088750
fbshipit-source-id: bd21cb8a0
Summary:
This fixes two bugs:
* All local variables would get existentially quantified out, that means
the the local variables of the caller couldn't be restored properly
* Frame was added back on after the formals were killed. Which meant
that if frame talked about formals (in pure for example), those
formals would remain to be free variables.
Reviewed By: ngorogiannis
Differential Revision: D16091157
fbshipit-source-id: dfe12ed82
Summary:
This fixes two issues with function summarization when calling a
function multiple times.
* Previously on return, the actuals wouldn't get added back in, so
their name would be "lost" (that is existentially quantified out),
this patch adds the formals to actuals equalities back on return,
before quantifying the formals out.
* Previously the entry state of the function would be lost if there were
multiple calls to other functions.
Reviewed By: jberdine
Differential Revision: D16071656
fbshipit-source-id: 9df7b1d4b
Summary:
Currently alarms are reported to stdout while the debug trace is
written to stderr. This makes synchronizing the two difficult. With
this diff, the alarm reports can also be included in the debug trace,
and analysis can be stopped when an alarm is encountered by tracing
the `Stop` module, e.g.:
```
sledge -trace Report+Stop.on_invalid_access
```
Reviewed By: kren1
Differential Revision: D16072611
fbshipit-source-id: 32c3639a2
Summary:
There are many assumptions on the behavior of mutexes, condition
variables, etc. in the implementation of the cxxabi with threads
support. So compile with `_LIBCXXABI_HAS_NO_THREADS` defined to select
the much simpler code paths for the single-threaded case.
Reviewed By: kren1
Differential Revision: D16069454
fbshipit-source-id: 9f975e0e6
Summary:
Each call to __cxa_allocate_exception, in practice, is shortly
followed by raising an exception. With -skip-throw, execution will not
proceed past the throw. Since the concrete implementation of
__cxa_allocate_exception and the following initialization of the
exception object is very low-level code that plays tricks, the
analyzer has trouble with it. So model __cxa_allocate_exception as
unreachable to avoid (needlessly) executing that code and potentially
failing spuriously.
Reviewed By: kren1
Differential Revision: D16069451
fbshipit-source-id: bea1dae09
Summary:
Allow intrinsics to return an inconsistent state, to specify that they
do not return.
Reviewed By: kren1
Differential Revision: D16069453
fbshipit-source-id: deb5d2a22
Summary:
This adds an optimized debug build mode, which is compiled with
optimizations, and without assertions, but still has tracing enabled.
Reviewed By: kren1
Differential Revision: D16069452
fbshipit-source-id: 445cfa329
Summary:
The report output got disturbed by the change from predicate to
relational Domain, and the tricky control of printing simplified
states. After this diff by default states are printed in full, and in
simplified form with `-t State_domain.pp_simp`.
Also includes some minor output improvements.
Reviewed By: kren1
Differential Revision: D16059780
fbshipit-source-id: b33289887
Summary:
Trivial renamings to use the standard "libFuzzer" name instead of "lib
fuzzer".
Reviewed By: kren1
Differential Revision: D16067881
fbshipit-source-id: 3ff2a8f86
Summary:
On function return add the computed summary (pre/post) condition to a
hashtable.
Reviewed By: jberdine
Differential Revision: D16052136
fbshipit-source-id: 0c5c9bafb
Summary:
Outputting the list of bitcode inputs when no output flag is ok for
`sledge buck bitcode` but does not make sense when it is composed as
part of other commands. So only output to stdout if `-` is given as
the output file name.
Reviewed By: kren1
Differential Revision: D16059782
fbshipit-source-id: abac9c36f
Summary:
To easily monitor and track changes to the help generated by the
command line interface, generate it in full and add it to the repo.
Reviewed By: kren1
Differential Revision: D16059783
fbshipit-source-id: be15f9943
Summary:
This diff enhances `-function-summaries` to remember the frame computed by
the solver and actually execute the function using the summary. Upon
return the frame is added back on the computed post condition.
Reviewed By: ngorogiannis
Differential Revision: D15900318
fbshipit-source-id: 8bb56b771
Summary:
This diff is preparation for function summarization and focuses on
function calls and function summary precondition computation.
It introduces `-function-summaries` flag behind most of functionality is
hidden, when enabled on each call
* A function summary is computed by quantifying all the non-formal/global variables
and removing all the segments that are not reachable from them
* `pre` and `foot` are computed from function summary and the calling context
by replacing formals with actuals again.
* A solver is asked if `pre` entails `foot` and a frame is printed if it
does
Currently this only works for formulas without disjunctions, so when
function summaries are enabled, that state is first moved to dnf and then
the call is done for each disjunct.
Reviewed By: ngorogiannis
Differential Revision: D15898928
fbshipit-source-id: 49d32504c
Summary: For buck targets that contain at least one of the substrings in `buck-target-pattern` option in config, change the buck target to add `_sledge` suffix.
Reviewed By: jberdine
Differential Revision: D15920018
fbshipit-source-id: 44c242e99
Summary: And fix test Makefile to call the C++ compiler on .cpp files.
Reviewed By: kren1
Differential Revision: D15972426
fbshipit-source-id: 719de755f
Summary:
Simplify all conversions between castable types to the identity. The
backend treats castable types as equal, so distinguishing conversions
between them is incomplete.
Reviewed By: kren1
Differential Revision: D15972427
fbshipit-source-id: fa09859ac
Summary:
The entry block contains all locals of the entire function, as
required by the backend. This makes the manipulation of the locals of
each block redundant.
This diff moves the locals from the entry block to the function
itself, removes the Locals frames of the Control.Stack, and adds a
locals field to Return frames.
This is part cleanup and part preparation for removing the
Control.Stack.
Reviewed By: ngorogiannis
Differential Revision: D15963503
fbshipit-source-id: 523ebc260
Summary:
Adds `-mergefunc` and `-dce` passes to `Frontend.translate` to match
the `buck link` flow with `opt`
Reviewed By: ngorogiannis
Differential Revision: D15938641
fbshipit-source-id: 128cb89cd
Summary:
The current handling of the formal return variable scope is not
correct. Since it is passed as an actual argument to the return
continuation, it is manipulated as if it was a local variable of the
caller. However, its scope is not ended with the caller's locals,
leading to clashes.
This diff reworks the passing of return values to avoid this problem,
mainly by introducing a notion of temporary variables during parameter
passing. This essentially has the effect of taking a function spec
{ P } f(x) { λv. Q }
and generating a "temporary" variable v, applying the post λv. Q to it
to obtain the pre-state for the call to the return continuation
k(v). Being a temporary variable just means that it goes out of scope
just after parameter passing. This amounts to a long-winded way of
applying the post-state to the formal parameter of the return
continuation without violating scopes or SSA.
This diff also separates the manipulation of the symbolic states as they
proceed from:
1. the pre-state before the return instruction;
2. the exit-state after the return instruction (including the binding
of the returned value to the return formal variable);
3. the post-state, where the locals are existentially quantified; and
4. the return-state, which is expressed in terms of actual args
instead of formal parameters.
Also in support of summarization, formal return and throw parameters
are no longer tracked on the analyzer's stack.
Note that these changes involve changing the locals of blocks and
functions to no longer include the formal parameters.
Reviewed By: kren1
Differential Revision: D15912148
fbshipit-source-id: e41dd6e42
Summary:
The solver couldn't deal with `∃ a,b . a = b` , so this diff adds
a special case to deal with it.
Reviewed By: ngorogiannis
Differential Revision: D15897953
fbshipit-source-id: d841d3557
Summary:
:
This patch adds several passes that reduce the amount of bitcode
making sledge's job easier, more info:
https://llvm.org/docs/Passes.html
`-mergefunc`
This pass merges functions that do the same thing, this can be because of
templating or casts (ie. same functionality but on 32bit and 64bit ints,
which is the same in machine code). More details at
http://llvm.org/docs/MergeFunctions.html
Note that this pass is currently not available through C/OCaml API.
`-constmerge`
This merges constants that have the same value, this is possible to do
when the constants are internalized.
`-argpromotion`
```
This pass promotes “by reference” arguments to be “by value” arguments.
In practice, this means looking for internal functions that have pointer
arguments. If it can prove, through the use of alias analysis, that an
argument is only loaded, then it can pass the value into the function
instead of the address of the value. This can cause recursive
simplification of code and lead to the elimination of allocas
(especially in C++ template code like the STL).
```
`-ipsccp`
```
Sparse conditional constant propagation and merging, which can be
summarized as:
Assumes values are constant unless proven otherwise
Assumes BasicBlocks are dead unless proven otherwise
Proves values to be constant, and replaces them with constants
Proves conditional branches to be unconditional
```
`-deadargelim`
Removes dead arguments of internal functions, good to run after other inter-procedural
passes. Seems to crash llvm if run directly after `ipsccp`.
Note that while this might look like doing full link-time optimisation,
we are actually picking relatively cheap optimisations that mostly look
at globals and walk their use chains. The main reason link-time
optimisations are expensive is due to inlining and then running the full
optimisation again from there.
Reviewed By: jberdine
Differential Revision: D15851408
fbshipit-source-id: be7191683
Summary:
This diff introduces a `-lib-fuzz` flag to `buck link`, which links in a
simple main that calls the LLVMFuzzerTestOneInput function, which is the
entry point of libFuzzer fuzzer.
Reviewed By: jberdine, jvillard
Differential Revision: D15821512
fbshipit-source-id: cff731ed3
Summary:
Previous change to allow bitcasts in call instructions was too strict
and did not allow for indirect calls.
Reviewed By: jberdine
Differential Revision: D15803262
fbshipit-source-id: 40d828b59
Summary:
Currently printing symbolic heaps is unreadable, because there are too
many quantified variables, that are mostly just equal to other
variables.
This diff tries to replace all variables in an equivalence class with a
single variable and remove the unneccesary variables.
It also introduces two modes for printing state domains:
`-t +State_domain.pp_full` prints the state domain as is
`-t +State_domain.pp` uses the simplification before printing.
Reviewed By: jberdine
Differential Revision: D15738748
fbshipit-source-id: 7c85b580e
Summary:
Print pre- and post- conditions (aka, summaries) when analyzer hits a
function return
- plumbing the precondition through the analyzer
so that it is available when return is hit
Reviewed By: jberdine
Differential Revision: D15713725
fbshipit-source-id: b10b6206f
Summary:
This diff adds a `__llair_alloc` intrinsic which is modeled
as a non-failing malloc. Using it instead of `malloc` increases
the readbility of symbolic heaps, because it removes all the cases
where malloc failed.
Note that `assert(malloc())` does not have the desired effect.
Reviewed By: ngorogiannis
Differential Revision: D15778817
fbshipit-source-id: d02784077
Summary:
Some call instructions in LLVM bitcast the function,
for example
`%call = call i32 (i64, ...) bitcast (i32 (...)* @__llair_alloc to i32 (i64, ...)*)(i64 %conv)`
This would cause sledge to crash in LLVM when build with assertions.
Reviewed By: jberdine
Differential Revision: D15779003
fbshipit-source-id: c273f92db
Summary:
This diff adds a formal parameter to each non-void-returning function
to name the return value, and similarly a formal parameter for the
thrown exception value. These are interpreted as call-by-reference
parameters, so that they can be constrained in formulas to e.g. be
equal to the return value, and are still in scope when the function
returns, and so can be passed to the return block. Prior to
summarizing functions, this means that these formals need to be
tracked on the analyzer's control stack.
This will be needed to express function specs/summaries in terms of
formals, and fixes a bug where in some cases return values were not
tracked correctly.
Reviewed By: kren1
Differential Revision: D15738026
fbshipit-source-id: fff2d107c
Summary:
Previously the locals of a function were computed after backpatching
the blocks in its cfg. This resulted in loss of sharing, and incorrect
locals if queried through the parent of a block.
Reviewed By: kren1
Differential Revision: D15738027
fbshipit-source-id: d7e70530a
Summary:
Disable exceptional control flow
- treat throw as unreachable
- confidence in the correctness of the frontend's treatment of
exception handling is very low, and making summaries that are
expressive enough to talk about exceptions is a complication
that isn't needed for the first iteration
To facilitate, start on a struct that holds all the CL options.
Reviewed By: jberdine, jvillard
Differential Revision: D15713601
fbshipit-source-id: ee92dfbd8
Summary:
This diff adapts the test scripts to the new sledge CLI, and reworks
them to enable checking changes with respect to a baseline. In
particular, now
```
make -C test
```
has exit code 0 if the current test results match the expected ones,
and otherwise prints the diff. Also,
```
make -C test promote
```
promotes the current test results to the new baseline.
Reviewed By: kren1
Differential Revision: D15706573
fbshipit-source-id: 0cbf3231e
Summary:
Include cxa_default_handlers.cpp to bring in definitions for
__cxa_terminate_handler and __cxa_unexpected_handler.
Reviewed By: kren1
Differential Revision: D15712980
fbshipit-source-id: f536930a8
Summary:
Sometimes calls to the `abort` C stdlib function appear as `invoke`
instructions in LLVM. They should be translated to the LLAIR abort
instruction just like the non-raising `call abort` case.
Reviewed By: kren1
Differential Revision: D15706574
fbshipit-source-id: 1509ed0e3
Summary:
Most binary and ternary operations have the same type as their
arguments, so try to compute the type of arguments in these cases.
Reviewed By: kren1
Differential Revision: D15706576
fbshipit-source-id: 4749d6e32
Summary:
When LLVM is built with assertions, it crash
`add_sym` if you try to get the global scope of a non global value.
This patch special cases add_sym, to just do nothing when `llv` is
an `UndefValue`.
Also enhances debuging printout of transalte to include the number of
functions and globals.
Reviewed By: jvillard
Differential Revision: D15669447
fbshipit-source-id: 4b5483810
Summary:
The entry point functions are used in a couple of places, this
puts them in a single source of truth in the config file.
Reviewed By: jvillard
Differential Revision: D15651976
fbshipit-source-id: a572e8d4d
Summary:
This adds a globalopt optimization pass to sledge.
Consider code like:
```
const char *a_string = "I'm a string";
int an_int = 0;
int c() {
return an_int;
}
int main() {
char *c1 = a_string;
return c();
}
```
When compiled there are 2 levels of indirection. For example
`return an_int` Get's compiled as
```
%0 = load i32, i32* an_int1
ret i32 %0
```
Global opt reduces this (if `an_int` is internal) to just
` ret i32 0`.
Similarly and more importantly
`c1 = a_string;` get's compiled into
```
@.str = private unnamed_addr constant [13 x i8] c"I'm a string\00"
a_string = dso_local global i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i32 0, i32 0)
%c1 = alloca i8*, align 8
%0 = load i8*, i8** a_string, align 8, !dbg !25
store i8* %0, i8** %c1, align 8, !dbg !24
```
So there is a level of indirection between `c1` and `.str` where the string is stored.
With global opt, this gets reduced to:
```
@.str = private unnamed_addr constant [13 x i8] c"I'm a string\00"
%c1 = alloca i8*, align 8
store i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i64 0, i64 0), i8** %c1, align 8, !dbg !23
```
and `a_string` variable gets deleted.
On sledge this has the effect of reducing the complexity of the symbolic heap significantly.
Without this optimisation, running
`sledge.dbg llvm analyze -trace Domain.call global_vars.bc`
Gives prints the following segments:
```
∧ %.str -[)-> ⟨13,{}⟩
* %a_string -[)-> ⟨8,%.str⟩
* %an_int -[)-> ⟨4,0⟩
* %c1 -[)-> ⟨8,%.str⟩
* %retval -[)-> ⟨4,0⟩
```
So there are `an_int` and `a_string` segments, which are redundant.
with the optimisation, the heap looks like:
`∧ %.str -[)-> ⟨13,{}⟩ * %c1 -[)-> ⟨8,%.str⟩ * %retval -[)-> ⟨4,0⟩`,
Where we only have the `.str` segment and the `c1` segment, which are the two we need.
Reviewed By: ngorogiannis
Differential Revision: D15649195
fbshipit-source-id: 5f71e56e8
Summary:
Do not implicitly open `Trace`, which shadows `Import.fail`, and
degrades uncaught exceptions. Opening `Trace` was a mistake.
Reviewed By: kren1
Differential Revision: D15653730
fbshipit-source-id: d65277af5
Summary:
Change command line interface to include buck and llvm integration as
separate subcommands.
Reviewed By: kren1
Differential Revision: D15614567
fbshipit-source-id: b7618571b
Summary:
Use the new LLVM bindings to handle internalization.
We would like to use global dead code elimination (gdce) to remove all the code not reachable from an entry point. However in normal compilation most functions aren't globally dead (they can be linked to). At the point of running sledge we won't be linking anymore, therefore we can internalize (make invisible outside of the compilation unit) all the symbols. In that case the whole program is dead with respect to gdce, therefore we need to preserve the entry point as external.
Previously we could only preserve `main` as the entry point (through a boolean flag). This patch uses a newer API that lets us preserve functions that satisfy a given predicate function. This enables us to have arbitrary entry points (not just main). Currently only the 3 entry points from `src/control.ml` are used, but this patch makes it easy to change.
Reviewed By: ngorogiannis
Differential Revision: D15561461
fbshipit-source-id: 88e054411
Summary:
LLVM.global_initializer casues a cast assertion failure if the value
passed to it is not a GlobalVariable. So we first check if it is a
GlobalVariable and only then ask for an initiliazer.
This is hidden if LLVM is built without assertions.
Reviewed By: jberdine
Differential Revision: D15601632
fbshipit-source-id: e9db23a12
Summary:
Sledge does not terminate on programs with recursion, because
functions get "infinitely inlined" and therefore recursion is not
treated as retreating edge.
This patch bounds the number of times the same function can "inlined"
to respect the bound (`-b` option). On each call we check the number of
occurances of the called function in the call stack. If that is higher
than the bound, we skip it.
Reviewed By: jvillard
Differential Revision: D15577134
fbshipit-source-id: 4cd3b62c6
Summary:
This diff changes the translation of global variables to translate the
initializer whenever it exists in LLVM, rather than relying on
linkage. Previously code such as
```
char *mutable_string = "hahaha";
```
would lead to LLVM code
```
nutritious_string = global i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0), align 8, !dbg !0 ; [#uses=2 type=i8**]
```
in which `mutable_string` had `External` linkage according to
`Llvm.linkage` even though it has an initializer. This could cause
sledge to drop the initializer.
Reviewed By: kren1
Differential Revision: D15577755
fbshipit-source-id: 50aa06c5e
Summary:
Global variables have pointer type. The size needed by the backend is
of the element type, not of the pointer itself. This diff corrects
this.
Reviewed By: kren1
Differential Revision: D15577756
fbshipit-source-id: 948ecf3cd
Summary:
The root cause is not clear, but it seems that not calling
Llvm.dispose_context avoids segfaults in the GC.
Reviewed By: kren1
Differential Revision: D15535434
fbshipit-source-id: 280e44d0b
Summary:
The name and loc tables are added-to almost exactly in sync, so
combine them to amortize the overhead.
Reviewed By: kren1
Differential Revision: D15535435
fbshipit-source-id: 801da75bb
Summary:
Format is slow. Especially Format.sprintf, which has to allocate and
initialize a buffer every time.
Reviewed By: kren1
Differential Revision: D15535437
fbshipit-source-id: ea43f44e1
Summary:
This isn't free and is expected to hold of bitcode produced by
clang/llvm. There are tests that fail verification, so keep it in
debug mode.
Reviewed By: kren1
Differential Revision: D15535438
fbshipit-source-id: 9390a8363
Summary:
If the input file has a .bc or .ll suffix, treat it as a pre-linked
bitcode file. Otherwise, treat it as before, as a file containing a
list of bitcode files to be linked. Also, perform global dead-code
elimination only when linking multiple files.
Reviewed By: kren1
Differential Revision: D15513345
fbshipit-source-id: 4c80ff9c3
Summary:
Change the sledge input format from a bitcode file to a newline
separated list of paths to LLVM bitcode files.
Reviewed By: jberdine
Differential Revision: D15470082
fbshipit-source-id: 8860f947c
Summary:
Llvm.string_of_llvalue, which just calls llvm::Value::print, is
extremely slow when called on instructions or functions. In these
cases, it initializes metadata slots for everything in the *parent
module* of the instruction or function being printed, on every
call. This is ridiculously slow, don't do it.
Reviewed By: kren1
Differential Revision: D15511376
fbshipit-source-id: 658eeccab
Summary:
Add shortcut code paths to return early in some cases guaranteed to be
the identity function.
Reviewed By: ngorogiannis
Differential Revision: D15468704
fbshipit-source-id: f137049c6
Summary:
It is pointless to track membership of atomic exps in the congruence
relation, as they cannot have any subexps which might later become
equal to something else.
Reviewed By: jvillard
Differential Revision: D15424820
fbshipit-source-id: 048dbc9e1
Summary:
For some Exp forms, Exp.solve is not complete, and this is necessary
since the result of solve is a substitution that needs to encode the
input equation as a conjunction of equations each between a variable
and an exp. This is tantamount to, stronger even than, the theory
being convex. So Exp.solve is not complete for some exps, and some of
those have constructors that perform some simplification. For example,
`(1 ≠ 0)` simplifies to `-1` (i.e. true). To enable deductions such as
`((x ≠ 0) = b) && x = 1 |- b = -1` needs the equality solver to
substitute through subexps of simplifiable exps like = and ≠, as it
does for interpreted exps like + and ×. At the same time, since
Exp.solve for non-interpreted exps cannot be complete, to enable
deductions such as `((x ≠ 0) = (y ≠ 0)) && x = 1 |- y ≠ 0` needs the
equality solver to congruence-close over subexps of simplifiable exps
such as = and ≠, as it does for uninterpreted exps.
To strengthen the equality solver in these sorts of cases, this diff
adds a new class of exps for = and ≠, and revises the equality solver
to handle them in a hybrid fashion between interpreted and
uninterpreted.
I am not currently sure whether or not this breaks the termination
proof, but I have also not managed to adapt usual divergent cases to
break this. One notable point is that simplifying = and ≠ exps always
produces genuinely simpler and smaller exps, in contrast to
e.g. polynomial simplification and gaussian elimination.
Note that the real solution to this problem is likely to be to
eliminate the i1 type in favor or a genuine boolean type, and
translate all integer operations on i1 to boolean/logical ones. Then
the disjunction implicit in e.g. equations between disequations would
appear as actual disjunction, and could be dealt with as such.
Reviewed By: jvillard
Differential Revision: D15424823
fbshipit-source-id: 67d62df1f
Summary:
* Adds compilation of cxxabi.bc
* Includes the cxxabi.bc into the sledge executable via opam-crunch
* Links cxxabi.bc in sledge frontend
Reviewed By: jberdine
Differential Revision: D15415190
fbshipit-source-id: cc42f09fb
Summary:
There are two motivations for this:
1. Distinguish between `Unreachable`, which silently terminates
execution a la `assume false`; and `Abort`, which vocally
terminates execution a la `assert false`.
2. Distinguish between undefined functions, which have `Unreachable`
bodies, and bomb functions such as:
```
define void bomb() {
tail call void llvm.trap()
unreachable
}
```
Reviewed By: ngorogiannis
Differential Revision: D15408246
fbshipit-source-id: b64354cdb
Summary:
Require a final `()` argument to explicitly indicate the end of the
arguments to the printf-like functions. For `warn` this is not
any safer because the return type is `unit` anyhow, but for `fail` the
return type is polymorphic so the final `()` prevents unintentionally
forgetting an argument.
Reviewed By: ngorogiannis
Differential Revision: D15403367
fbshipit-source-id: ce3fe4035
Summary:
llvm.trap is noreturn nounwind so calls to it are always succeeded by
Unreachable, therefore unless an alarm is desired for reaching it,
translating it as nop suffices.
Reviewed By: ngorogiannis
Differential Revision: D15328302
fbshipit-source-id: 54efe6c21
Summary:
A previous rebase fumble sometimes led to the wrong arguments being
passed to retpolines of Invoke instructions.
Reviewed By: ngorogiannis
Differential Revision: D15323950
fbshipit-source-id: f6eb6fbe2