Summary:
This adds a globalopt optimization pass to sledge.
Consider code like:
```
const char *a_string = "I'm a string";
int an_int = 0;
int c() {
return an_int;
}
int main() {
char *c1 = a_string;
return c();
}
```
When compiled there are 2 levels of indirection. For example
`return an_int` Get's compiled as
```
%0 = load i32, i32* an_int1
ret i32 %0
```
Global opt reduces this (if `an_int` is internal) to just
` ret i32 0`.
Similarly and more importantly
`c1 = a_string;` get's compiled into
```
@.str = private unnamed_addr constant [13 x i8] c"I'm a string\00"
a_string = dso_local global i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i32 0, i32 0)
%c1 = alloca i8*, align 8
%0 = load i8*, i8** a_string, align 8, !dbg !25
store i8* %0, i8** %c1, align 8, !dbg !24
```
So there is a level of indirection between `c1` and `.str` where the string is stored.
With global opt, this gets reduced to:
```
@.str = private unnamed_addr constant [13 x i8] c"I'm a string\00"
%c1 = alloca i8*, align 8
store i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i64 0, i64 0), i8** %c1, align 8, !dbg !23
```
and `a_string` variable gets deleted.
On sledge this has the effect of reducing the complexity of the symbolic heap significantly.
Without this optimisation, running
`sledge.dbg llvm analyze -trace Domain.call global_vars.bc`
Gives prints the following segments:
```
∧ %.str -[)-> ⟨13,{}⟩
* %a_string -[)-> ⟨8,%.str⟩
* %an_int -[)-> ⟨4,0⟩
* %c1 -[)-> ⟨8,%.str⟩
* %retval -[)-> ⟨4,0⟩
```
So there are `an_int` and `a_string` segments, which are redundant.
with the optimisation, the heap looks like:
`∧ %.str -[)-> ⟨13,{}⟩ * %c1 -[)-> ⟨8,%.str⟩ * %retval -[)-> ⟨4,0⟩`,
Where we only have the `.str` segment and the `c1` segment, which are the two we need.
Reviewed By: ngorogiannis
Differential Revision: D15649195
fbshipit-source-id: 5f71e56e8
Summary:
Do not implicitly open `Trace`, which shadows `Import.fail`, and
degrades uncaught exceptions. Opening `Trace` was a mistake.
Reviewed By: kren1
Differential Revision: D15653730
fbshipit-source-id: d65277af5
Summary:
Change command line interface to include buck and llvm integration as
separate subcommands.
Reviewed By: kren1
Differential Revision: D15614567
fbshipit-source-id: b7618571b
Summary:
Use the new LLVM bindings to handle internalization.
We would like to use global dead code elimination (gdce) to remove all the code not reachable from an entry point. However in normal compilation most functions aren't globally dead (they can be linked to). At the point of running sledge we won't be linking anymore, therefore we can internalize (make invisible outside of the compilation unit) all the symbols. In that case the whole program is dead with respect to gdce, therefore we need to preserve the entry point as external.
Previously we could only preserve `main` as the entry point (through a boolean flag). This patch uses a newer API that lets us preserve functions that satisfy a given predicate function. This enables us to have arbitrary entry points (not just main). Currently only the 3 entry points from `src/control.ml` are used, but this patch makes it easy to change.
Reviewed By: ngorogiannis
Differential Revision: D15561461
fbshipit-source-id: 88e054411
Summary:
LLVM.global_initializer casues a cast assertion failure if the value
passed to it is not a GlobalVariable. So we first check if it is a
GlobalVariable and only then ask for an initiliazer.
This is hidden if LLVM is built without assertions.
Reviewed By: jberdine
Differential Revision: D15601632
fbshipit-source-id: e9db23a12
Summary:
Sledge does not terminate on programs with recursion, because
functions get "infinitely inlined" and therefore recursion is not
treated as retreating edge.
This patch bounds the number of times the same function can "inlined"
to respect the bound (`-b` option). On each call we check the number of
occurances of the called function in the call stack. If that is higher
than the bound, we skip it.
Reviewed By: jvillard
Differential Revision: D15577134
fbshipit-source-id: 4cd3b62c6
Summary:
This diff changes the translation of global variables to translate the
initializer whenever it exists in LLVM, rather than relying on
linkage. Previously code such as
```
char *mutable_string = "hahaha";
```
would lead to LLVM code
```
nutritious_string = global i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0), align 8, !dbg !0 ; [#uses=2 type=i8**]
```
in which `mutable_string` had `External` linkage according to
`Llvm.linkage` even though it has an initializer. This could cause
sledge to drop the initializer.
Reviewed By: kren1
Differential Revision: D15577755
fbshipit-source-id: 50aa06c5e
Summary:
Global variables have pointer type. The size needed by the backend is
of the element type, not of the pointer itself. This diff corrects
this.
Reviewed By: kren1
Differential Revision: D15577756
fbshipit-source-id: 948ecf3cd
Summary:
The root cause is not clear, but it seems that not calling
Llvm.dispose_context avoids segfaults in the GC.
Reviewed By: kren1
Differential Revision: D15535434
fbshipit-source-id: 280e44d0b
Summary:
The name and loc tables are added-to almost exactly in sync, so
combine them to amortize the overhead.
Reviewed By: kren1
Differential Revision: D15535435
fbshipit-source-id: 801da75bb
Summary:
Format is slow. Especially Format.sprintf, which has to allocate and
initialize a buffer every time.
Reviewed By: kren1
Differential Revision: D15535437
fbshipit-source-id: ea43f44e1
Summary:
This isn't free and is expected to hold of bitcode produced by
clang/llvm. There are tests that fail verification, so keep it in
debug mode.
Reviewed By: kren1
Differential Revision: D15535438
fbshipit-source-id: 9390a8363
Summary:
If the input file has a .bc or .ll suffix, treat it as a pre-linked
bitcode file. Otherwise, treat it as before, as a file containing a
list of bitcode files to be linked. Also, perform global dead-code
elimination only when linking multiple files.
Reviewed By: kren1
Differential Revision: D15513345
fbshipit-source-id: 4c80ff9c3
Summary:
Change the sledge input format from a bitcode file to a newline
separated list of paths to LLVM bitcode files.
Reviewed By: jberdine
Differential Revision: D15470082
fbshipit-source-id: 8860f947c
Summary:
Llvm.string_of_llvalue, which just calls llvm::Value::print, is
extremely slow when called on instructions or functions. In these
cases, it initializes metadata slots for everything in the *parent
module* of the instruction or function being printed, on every
call. This is ridiculously slow, don't do it.
Reviewed By: kren1
Differential Revision: D15511376
fbshipit-source-id: 658eeccab
Summary:
Add shortcut code paths to return early in some cases guaranteed to be
the identity function.
Reviewed By: ngorogiannis
Differential Revision: D15468704
fbshipit-source-id: f137049c6
Summary:
It is pointless to track membership of atomic exps in the congruence
relation, as they cannot have any subexps which might later become
equal to something else.
Reviewed By: jvillard
Differential Revision: D15424820
fbshipit-source-id: 048dbc9e1
Summary:
For some Exp forms, Exp.solve is not complete, and this is necessary
since the result of solve is a substitution that needs to encode the
input equation as a conjunction of equations each between a variable
and an exp. This is tantamount to, stronger even than, the theory
being convex. So Exp.solve is not complete for some exps, and some of
those have constructors that perform some simplification. For example,
`(1 ≠ 0)` simplifies to `-1` (i.e. true). To enable deductions such as
`((x ≠ 0) = b) && x = 1 |- b = -1` needs the equality solver to
substitute through subexps of simplifiable exps like = and ≠, as it
does for interpreted exps like + and ×. At the same time, since
Exp.solve for non-interpreted exps cannot be complete, to enable
deductions such as `((x ≠ 0) = (y ≠ 0)) && x = 1 |- y ≠ 0` needs the
equality solver to congruence-close over subexps of simplifiable exps
such as = and ≠, as it does for uninterpreted exps.
To strengthen the equality solver in these sorts of cases, this diff
adds a new class of exps for = and ≠, and revises the equality solver
to handle them in a hybrid fashion between interpreted and
uninterpreted.
I am not currently sure whether or not this breaks the termination
proof, but I have also not managed to adapt usual divergent cases to
break this. One notable point is that simplifying = and ≠ exps always
produces genuinely simpler and smaller exps, in contrast to
e.g. polynomial simplification and gaussian elimination.
Note that the real solution to this problem is likely to be to
eliminate the i1 type in favor or a genuine boolean type, and
translate all integer operations on i1 to boolean/logical ones. Then
the disjunction implicit in e.g. equations between disequations would
appear as actual disjunction, and could be dealt with as such.
Reviewed By: jvillard
Differential Revision: D15424823
fbshipit-source-id: 67d62df1f
Summary:
* Adds compilation of cxxabi.bc
* Includes the cxxabi.bc into the sledge executable via opam-crunch
* Links cxxabi.bc in sledge frontend
Reviewed By: jberdine
Differential Revision: D15415190
fbshipit-source-id: cc42f09fb
Summary:
There are two motivations for this:
1. Distinguish between `Unreachable`, which silently terminates
execution a la `assume false`; and `Abort`, which vocally
terminates execution a la `assert false`.
2. Distinguish between undefined functions, which have `Unreachable`
bodies, and bomb functions such as:
```
define void bomb() {
tail call void llvm.trap()
unreachable
}
```
Reviewed By: ngorogiannis
Differential Revision: D15408246
fbshipit-source-id: b64354cdb
Summary:
Require a final `()` argument to explicitly indicate the end of the
arguments to the printf-like functions. For `warn` this is not
any safer because the return type is `unit` anyhow, but for `fail` the
return type is polymorphic so the final `()` prevents unintentionally
forgetting an argument.
Reviewed By: ngorogiannis
Differential Revision: D15403367
fbshipit-source-id: ce3fe4035
Summary:
llvm.trap is noreturn nounwind so calls to it are always succeeded by
Unreachable, therefore unless an alarm is desired for reaching it,
translating it as nop suffices.
Reviewed By: ngorogiannis
Differential Revision: D15328302
fbshipit-source-id: 54efe6c21
Summary:
A previous rebase fumble sometimes led to the wrong arguments being
passed to retpolines of Invoke instructions.
Reviewed By: ngorogiannis
Differential Revision: D15323950
fbshipit-source-id: f6eb6fbe2
Summary:
It is (now?) the case that `[%sexp_of: type_name]` generates just
`sexp_of_type_name`, rather than expanding `type_name` to its
definition and generating a conversion function for that. Hence, when
such cases appear within `let rec sexp_of_type_name`, they get
captured, sometimes leading to divergence.
This diff fixes this by manually expanding such types into their
definitions.
Reviewed By: ngorogiannis
Differential Revision: D15314736
fbshipit-source-id: 716fff7cc
Summary: There was a missing case for singleton monomials of degree > 1.
Reviewed By: mbouaziz
Differential Revision: D15098810
fbshipit-source-id: e6d17d899
Summary:
Boolean Xor expressions are treated directly by the solver, so the 'no
new subexps' invariant for congruence closure is not needed.
Reviewed By: mbouaziz
Differential Revision: D15098817
fbshipit-source-id: d118c5881
Summary: Shifting by too many bits is undefined, so do not normalize them.
Reviewed By: mbouaziz
Differential Revision: D15098812
fbshipit-source-id: 96f4606d7
Summary:
When constructing division expressions from rational coefficients of
polynomials, the constant numerator and denominator should be clamped
to the number of bits in the result type.
Reviewed By: mbouaziz
Differential Revision: D15098825
fbshipit-source-id: fa448c39a
Summary:
Some constants are used during Exp normalization, to encode
subtraction using addition, units of addition and multiplication,
etc. Previously these were unconditionally integers, but need to be
floats for float arithmetic.
Reviewed By: mbouaziz
Differential Revision: D15098815
fbshipit-source-id: 13d1be142
Summary:
The frontend would implicitly assume there was (at least) one argument
to calls to operator new. If code declares operator new with the wrong
type, this can lead to crashing trying to access a missing arg.
Reviewed By: mbouaziz
Differential Revision: D15098820
fbshipit-source-id: 539281a83
Summary: These relaxations are needed for some of the llvm test suite.
Reviewed By: mbouaziz
Differential Revision: D15098813
fbshipit-source-id: 702d3ffd9
Summary:
Some globals have initializers which refer to the global
itself (e.g. for program counter relative offsets). Memoizing
translation of globals gives enough machinery to detect and handle
this situation.
Reviewed By: mbouaziz
Differential Revision: D15098819
fbshipit-source-id: ecc9dce92
Summary:
In some cases there were extra metadata operands (such as alignment of
an alloc) which would cause the debug locations to not be recorded.
Reviewed By: mbouaziz
Differential Revision: D15098816
fbshipit-source-id: a7e83f590
Summary:
This reverts commit e4f7a0dc8d439561a0be7e8b33ad924195a6c441.
No longer needed due to ocamlformat fix.
Reviewed By: mbouaziz
Differential Revision: D15098826
fbshipit-source-id: 2a4dc0f55
Summary:
When comparing stacks as part of a control-flow edge, treat each as a
code location in a hypothetical expansion of the program where all
non-recursive functions have been completely inlined. In particular,
this means to compare stacks as if all Locals frames or Return frames
for recursive calls had been removed. Additionally, the from_call info
in Return frames is ignored.
Reviewed By: jvillard
Differential Revision: D14657601
fbshipit-source-id: b8a42d3fa
Summary:
- Ensure that popping Throw or Return does not leave stale Throw or
Return frames
- Add a module for Control.stack
- Add invariant to enforce that a Throw frame can appear only
immediately above a Return frame
Reviewed By: jvillard
Differential Revision: D14547263
fbshipit-source-id: deb31b8af
Summary:
The backend does not generally interpret conditional expressions
specially. Therefore it is logically stronger to hoist them over other
boolean exps, for example normalizing
`e = (c ? t : f)` to `(c ? e = t : e =f)`
This enables the treatment of top-level conditional expressions in
terms of disjunction, conjunction and negation.
Reviewed By: mbouaziz
Differential Revision: D14495815
fbshipit-source-id: 4f2e8053f
Summary:
Eq and Dq expressions are interpreted by Equality and Exp, rather than
being considered uninterpreted functions. Classifying them as
interpreted results in stronger Equality normalization.
Reviewed By: mbouaziz
Differential Revision: D14495817
fbshipit-source-id: 44bb376c0
Summary: Also fix the spec for posix_memalign, and minor cleanup.
Reviewed By: ngorogiannis
Differential Revision: D14403648
fbshipit-source-id: 6b1cb3e3a
Summary:
The SL solver is currently not always able to append segments which
have been split symbolically, that is, at an internal point expressed
using a variable, rather than merely a constant.
Also, existential instantiation, that is, the choice of witnesses
during proof search, is currently sensitive to the order of
subformulas. This can lead to fragile incompleteness.
Reviewed By: mbouaziz
Differential Revision: D14481991
fbshipit-source-id: 80fe2f0a8
Summary:
Strengthen the Sh.seg constructor and Sh.is_false test to account for
the axiom that `null -[_;_)-> ⟨_,_⟩` is inconsistent.
Reviewed By: ngorogiannis
Differential Revision: D14481986
fbshipit-source-id: 7016e7451
Summary:
The dnf implementation dates to before nested existentials were
added. Updating it was overlooked, and it is just wrong.
Reviewed By: ngorogiannis
Differential Revision: D14481988
fbshipit-source-id: 9bba570f0
Summary:
P ∨ (∃x. Q ∨ R) could be simplified to (∃x. P ∨ Q ∨ R) and capture
occurrences of x in P.
Reviewed By: ngorogiannis
Differential Revision: D14481990
fbshipit-source-id: 92b474d59
Summary:
Also minor change to make strlen implementation syntactically closer
to spec.
Reviewed By: ngorogiannis
Differential Revision: D14403647
fbshipit-source-id: 48c771329
Summary:
This diff adds support in symbolic execution for calls to intrinsic
functions, to be used in lieu of adding a separate Llair instruction
for each intrinsic. This involves:
- adding skeleton support in Exec for symbolically execution an
intrinsic function call;
- exposing this in Domain;
- allowing symbolic execution of block terminators (e.g. function
call) to possibly fail; and
- generalizing Report for failing terminators.
Reviewed By: ngorogiannis
Differential Revision: D14403652
fbshipit-source-id: d86d9d1b8
Summary:
Some functions are translated directly to an expression, rather than a
function call. The llvm instructions for calling functions that may
and may not raise exceptions differ (Invoke and Call, resp.). This
diff adds support for the Invoke case.
Reviewed By: jvillard
Differential Revision: D14385596
fbshipit-source-id: 24458d960
Summary:
These tell llvm that contents of memory will not change during
execution of some code, which the analyzer does not need.
Reviewed By: jvillard
Differential Revision: D14385597
fbshipit-source-id: 0ef566a6f
Summary:
Make could get confused and use both the $(MODEL_DIR)/cxxabi.bc and
%.bc rules, leading to build failure.
Reviewed By: jvillard
Differential Revision: D14385600
fbshipit-source-id: 05f0ac6e1
Summary:
The backend has an implicit assumption that the entry block's locals
are all the function locals, not just those that happen to appear in
the entry block.
An alternative would be to collect the inverse renaming substitutions
produced by symbolic execution of jumps, which rename variables to
avoid clashes with the destination block's locals. These inverse
substitutions could be applied upon function return, in inverse
order. The complication with this approach is that it would be
necessary to distinguish between a destination local that clashes
because it has the same name as a local of some function higher in the
call stack versus a clash due to being a previous incarnation of the
same local due to a control-flow cycle. Accumulating unrenamings for
the latter case is expensive and pointless, as well as incongrous with
true interprocedural extensions.
So it seems preferable to exploit the asymmetry between the entry
block and others a bit more.
Reviewed By: jvillard
Differential Revision: D14354530
fbshipit-source-id: 815cdb224
Summary:
Interpreted subexps were not handled correctly: they must not be in
the carrier, but their non-interpreted subexps should.
Reviewed By: jvillard
Differential Revision: D14344291
fbshipit-source-id: 995896640
Summary:
In case the starting locations of two heap segments are
related (provably equal up to some offset), add equations between
their enclosing block to the goal. In these cases, the enclosing
blocks must be the same, so no completeness is lost. This has the
effect of instantiating existentials in the enclosing block prior to
others, which can avoid incomplete instantiation guesses.
Reviewed By: mbouaziz
Differential Revision: D14323550
fbshipit-source-id: 89a34a2c8
Summary: An initial set of basic sanity checks for frame inference.
Reviewed By: mbouaziz
Differential Revision: D14323549
fbshipit-source-id: d7cd4235f
Summary:
`('a, exn) result` is a more widely used type than with something
custom in place of `exn`, e.g. by the `Result` operations.
Reviewed By: mbouaziz
Differential Revision: D14322627
fbshipit-source-id: e2ed167ed