Summary:
The LLVM semantics and translation was not consistently treating the
1-bit word value condition as signed or unsigned.
Reviewed By: jberdine
Differential Revision: D17605766
fbshipit-source-id: 77edf63b7
Summary:
Give the llair semantics observable side effects (writes to global
variables) and a semantic function mirroring the LLVM semantics. Start
sketching out the LLVM/llair translation equivalence proof in a top-down
way from the obvious statement of equality of the semantics.
Reviewed By: jberdine
Differential Revision: D17399654
fbshipit-source-id: 2170678a8
Summary:
This includes a few changes and corrections to the semantics, to support
the translation. This initial attempt to reason about LLVM -> llair
showed three things that needed repair in the semantics, in addition to
various bugs. We address them as follows.
Refactor llair semantics to have only a single kind of flat value:
integers that fit into specified bit widths. Operations on size values
(e.g., offsets, indices and the like) can just take an integer and
ignore its number of bits. Pointers can just be considered integers that
fit into a certain size given by the constant pointer_size. Later on we
can consider making this a parameter to the model.
Change the generic memory model interface to use numbers rather than
words as the generic encoding of a large value. This makes it more
useful for llair where words are not used.
Pay more careful attention to signed/unsigned issues. Neither LLVM nor
llair have a concept of signed vs unsigned value. Instead individual
operations interpret bit patterns in various ways, some of which are
ambiguous in the LLVM manual. For example, since getelementpointer's
indices are explicitly said to be interpreted as signed 2's complement,
we should probably do the same for insertvalue and extractvalue. However
it is not clear how the argument to alloca is to be interpreted. For now
we assume signed.
Reviewed By: jberdine
Differential Revision: D17164133
fbshipit-source-id: 31a8af635
Summary:
Not everything is here yet, and there is some confusion on what to do
about the size values. However, the semantics has the right general
shape and will be a nice starting point for thinking about the details.
Reviewed By: jberdine
Differential Revision: D17111041
fbshipit-source-id: cc75651c6
Summary:
LLVM and llair have similar memory models, and we don't want to
duplicate any definitions or theorems. This adds a new memory model
theory which should be understandable in its own right. A heap is a
mapping from addresses to bytes, alongside a set of valid addresses, and
intervals that have been allocated already. Primitives are defined for
allocating and de-allocating as well as reading and writing chuncks of
bytes.
There is also a generic type of structured values, and functions for
converting them to/from byte arrays.
Reviewed By: jberdine
Differential Revision: D17074470
fbshipit-source-id: bdab6089f
Summary:
Each variable now contains its type, alongside its name. This is more
uniform than in LLVM, where the name is usually paired with a type, but
not always, for example, the register type of the result of an
extractvalue is left implicit.
Reviewed By: jberdine
Differential Revision: D16984630
fbshipit-source-id: 1c3bc4985
Summary:
HOL now lets us omit quotations on Datatypes and make them look more
like the other new-style HOL definitions.
Reviewed By: jberdine
Differential Revision: D16983934
fbshipit-source-id: f8ef3abb5
Summary:
This sketches out how translation can be approached. It is partially
based on the Sledge code.
For basic blocks, isn't based on the Sledge code, but just my own
thoughts as a starting point. Essentially, we are trying to build up
larger expressions, and so not assigning to temporary registers that
don't live past the end of the block. This does remove sharing, so a
fancier approach could check for multiple uses of end-of-block dead
registers, or look at the sizes of expressions. The approach should be
flexible enough to accommodate such changes.
Fix icmp syntax
Using finite maps is elegant in the semantics, but awkward for writing
the translation function. Refactor the mappings from labels to functions
and from labels to blocks to use association lists instead.
To remove phi nodes, the translation takes every edge in the control
flow graph and makes a new basic block that contains a single parallel
move instruction that corresponds to the action of the phi node of the
target block.
Reviewed By: jberdine
Differential Revision: D16831051
fbshipit-source-id: 005663e26
Summary:
The AST is not complete on expressions, but it should have most of the
important features.
The representation is in some ways very different from the OCaml
implementation, because the OCaml code uses mutation to build the CFG as
an actual pointer graph in memory, and also because the expression
representation is optimised for the backend. For the former, it should
be easy to see that the AST here is isomorphic, representing the CFG
with finite maps from block labels. The correspondence is less clear in
the latter case, but the point here is not to model or verify
implementation optimisations, but to give a semantics to llair as a
language.
Reviewed By: jberdine
Differential Revision: D16807132
fbshipit-source-id: b0f64b3ec