Summary:
This includes a few changes and corrections to the semantics, to support
the translation. This initial attempt to reason about LLVM -> llair
showed three things that needed repair in the semantics, in addition to
various bugs. We address them as follows.
Refactor llair semantics to have only a single kind of flat value:
integers that fit into specified bit widths. Operations on size values
(e.g., offsets, indices and the like) can just take an integer and
ignore its number of bits. Pointers can just be considered integers that
fit into a certain size given by the constant pointer_size. Later on we
can consider making this a parameter to the model.
Change the generic memory model interface to use numbers rather than
words as the generic encoding of a large value. This makes it more
useful for llair where words are not used.
Pay more careful attention to signed/unsigned issues. Neither LLVM nor
llair have a concept of signed vs unsigned value. Instead individual
operations interpret bit patterns in various ways, some of which are
ambiguous in the LLVM manual. For example, since getelementpointer's
indices are explicitly said to be interpreted as signed 2's complement,
we should probably do the same for insertvalue and extractvalue. However
it is not clear how the argument to alloca is to be interpreted. For now
we assume signed.
Reviewed By: jberdine
Differential Revision: D17164133
fbshipit-source-id: 31a8af635
Summary:
Not everything is here yet, and there is some confusion on what to do
about the size values. However, the semantics has the right general
shape and will be a nice starting point for thinking about the details.
Reviewed By: jberdine
Differential Revision: D17111041
fbshipit-source-id: cc75651c6
Summary:
LLVM and llair have similar memory models, and we don't want to
duplicate any definitions or theorems. This adds a new memory model
theory which should be understandable in its own right. A heap is a
mapping from addresses to bytes, alongside a set of valid addresses, and
intervals that have been allocated already. Primitives are defined for
allocating and de-allocating as well as reading and writing chuncks of
bytes.
There is also a generic type of structured values, and functions for
converting them to/from byte arrays.
Reviewed By: jberdine
Differential Revision: D17074470
fbshipit-source-id: bdab6089f
Summary:
Each variable now contains its type, alongside its name. This is more
uniform than in LLVM, where the name is usually paired with a type, but
not always, for example, the register type of the result of an
extractvalue is left implicit.
Reviewed By: jberdine
Differential Revision: D16984630
fbshipit-source-id: 1c3bc4985
Summary:
HOL now lets us omit quotations on Datatypes and make them look more
like the other new-style HOL definitions.
Reviewed By: jberdine
Differential Revision: D16983934
fbshipit-source-id: f8ef3abb5
Summary:
This sketches out how translation can be approached. It is partially
based on the Sledge code.
For basic blocks, isn't based on the Sledge code, but just my own
thoughts as a starting point. Essentially, we are trying to build up
larger expressions, and so not assigning to temporary registers that
don't live past the end of the block. This does remove sharing, so a
fancier approach could check for multiple uses of end-of-block dead
registers, or look at the sizes of expressions. The approach should be
flexible enough to accommodate such changes.
Fix icmp syntax
Using finite maps is elegant in the semantics, but awkward for writing
the translation function. Refactor the mappings from labels to functions
and from labels to blocks to use association lists instead.
To remove phi nodes, the translation takes every edge in the control
flow graph and makes a new basic block that contains a single parallel
move instruction that corresponds to the action of the phi node of the
target block.
Reviewed By: jberdine
Differential Revision: D16831051
fbshipit-source-id: 005663e26
Summary:
The AST is not complete on expressions, but it should have most of the
important features.
The representation is in some ways very different from the OCaml
implementation, because the OCaml code uses mutation to build the CFG as
an actual pointer graph in memory, and also because the expression
representation is optimised for the backend. For the former, it should
be easy to see that the AST here is isomorphic, representing the CFG
with finite maps from block labels. The correspondence is less clear in
the latter case, but the point here is not to model or verify
implementation optimisations, but to give a semantics to llair as a
language.
Reviewed By: jberdine
Differential Revision: D16807132
fbshipit-source-id: b0f64b3ec
Summary:
Change loc_var (for local variable) to reg (for register) because
loc_var looks too much like a location tagged variable.
Reviewed By: jberdine
Differential Revision: D16827920
fbshipit-source-id: 5b11f1065
Summary:
There could very well still be bugs in the semantics, since the
invariant here doesn't say all that much, and it completely ignores
local registers. But most trivial things and typos are probably fixed.
Reviewed By: jberdine
Differential Revision: D16803281
fbshipit-source-id: 48ba2523b
Summary:
Made progress on the sanity checking lemma (that the step relation
preserves some simple invariants on the state). Proved the Ret
instruction case of the state invariant lemma. To do this, I fixed a few
bugs in the definition, and strengthened the invariants.
Reviewed By: jberdine
Differential Revision: D16786900
fbshipit-source-id: 6fa8cb170
Summary:
Global variables need allocating and initialising before the machine can
start. The definition here shouldn't constrain how and where they are
allocated. For example, they don't all need to have separate
allocations. We also tag allocated blocks so that the allocation for a
global can never be deallocated.
Start working on a sanity checking invariant on states.
Reviewed By: jberdine
Differential Revision: D16735068
fbshipit-source-id: 0d5e60e7a
Summary:
Start working on a simple model of LLVM with the ultimate goal of
handling relevant and/or tricky aspects of LLVM and LLAIR and then
formalising the translation from LLVM to LLAIR.
This is a complete initial model of everything that we are interested in
except for exceptions, which should be tricky. Also no thought has gone
into the treatment of poison and the undefined value, so the treatment
is naive, which is at least partially justified because we are
interested in the semantics of LLVM IR after the optimisation passes
have run.
Include some sanity checking theorems.
Reviewed By: jberdine
Differential Revision: D16731885
fbshipit-source-id: fd53949fe