Summary:
Previously the LLVM semantics did the phi instructions at the head of a
block as part of executing the branch into that block. This looked a bit
weird, but had the advantage that the semantics knew which block was
being jumped from, which is necessary to run the phi instructions.
However, it meant that the rules for doing phi instructions would need
to show up with each branching construct. It was also annoying for the
LLVM->llair proof, since the phis are removed and their effect happens as
a distinct step from the branch.
Here we add a distinct Phi_ip instruction pointer to indicate that the
phi instructions at the start of the block should execute next, and then
be incremented to the usual numeric instruction pointer that points to
the non-phi instructions. The Phi_ip contains the identity of the
previous block.
Reviewed By: jberdine
Differential Revision: D17452416
fbshipit-source-id: 78fef7cca
Summary:
Give the llair semantics observable side effects (writes to global
variables) and a semantic function mirroring the LLVM semantics. Start
sketching out the LLVM/llair translation equivalence proof in a top-down
way from the obvious statement of equality of the semantics.
Reviewed By: jberdine
Differential Revision: D17399654
fbshipit-source-id: 2170678a8
Summary:
The simple LLVM semantics steps one instruction at a time, but the
generated llair does whole blocks at a time, since many individual LLVM
instructions can become a single llair expression. We add a bigger-step
LLVM semantics that does whole blocks at a time (except that it also
stops at function calls, since those end blocks in llair). The steps in
this bigger-step semantics should be at the same granularity as the
llair steps, making it easier to verify the translation.
We add a notion of observation to the LLVM semantics (right now, just
global variable writes) and use that to define two top-level semantic
functions, which we prove to be equivalent.
Reviewed By: jberdine
Differential Revision: D17396016
fbshipit-source-id: ee632fb92
Summary:
This includes a few changes and corrections to the semantics, to support
the translation. This initial attempt to reason about LLVM -> llair
showed three things that needed repair in the semantics, in addition to
various bugs. We address them as follows.
Refactor llair semantics to have only a single kind of flat value:
integers that fit into specified bit widths. Operations on size values
(e.g., offsets, indices and the like) can just take an integer and
ignore its number of bits. Pointers can just be considered integers that
fit into a certain size given by the constant pointer_size. Later on we
can consider making this a parameter to the model.
Change the generic memory model interface to use numbers rather than
words as the generic encoding of a large value. This makes it more
useful for llair where words are not used.
Pay more careful attention to signed/unsigned issues. Neither LLVM nor
llair have a concept of signed vs unsigned value. Instead individual
operations interpret bit patterns in various ways, some of which are
ambiguous in the LLVM manual. For example, since getelementpointer's
indices are explicitly said to be interpreted as signed 2's complement,
we should probably do the same for insertvalue and extractvalue. However
it is not clear how the argument to alloca is to be interpreted. For now
we assume signed.
Reviewed By: jberdine
Differential Revision: D17164133
fbshipit-source-id: 31a8af635
Summary:
LLVM and llair have similar memory models, and we don't want to
duplicate any definitions or theorems. This adds a new memory model
theory which should be understandable in its own right. A heap is a
mapping from addresses to bytes, alongside a set of valid addresses, and
intervals that have been allocated already. Primitives are defined for
allocating and de-allocating as well as reading and writing chuncks of
bytes.
There is also a generic type of structured values, and functions for
converting them to/from byte arrays.
Reviewed By: jberdine
Differential Revision: D17074470
fbshipit-source-id: bdab6089f
Summary:
Each variable now contains its type, alongside its name. This is more
uniform than in LLVM, where the name is usually paired with a type, but
not always, for example, the register type of the result of an
extractvalue is left implicit.
Reviewed By: jberdine
Differential Revision: D16984630
fbshipit-source-id: 1c3bc4985
Summary:
This sketches out how translation can be approached. It is partially
based on the Sledge code.
For basic blocks, isn't based on the Sledge code, but just my own
thoughts as a starting point. Essentially, we are trying to build up
larger expressions, and so not assigning to temporary registers that
don't live past the end of the block. This does remove sharing, so a
fancier approach could check for multiple uses of end-of-block dead
registers, or look at the sizes of expressions. The approach should be
flexible enough to accommodate such changes.
Fix icmp syntax
Using finite maps is elegant in the semantics, but awkward for writing
the translation function. Refactor the mappings from labels to functions
and from labels to blocks to use association lists instead.
To remove phi nodes, the translation takes every edge in the control
flow graph and makes a new basic block that contains a single parallel
move instruction that corresponds to the action of the phi node of the
target block.
Reviewed By: jberdine
Differential Revision: D16831051
fbshipit-source-id: 005663e26
Summary:
There could very well still be bugs in the semantics, since the
invariant here doesn't say all that much, and it completely ignores
local registers. But most trivial things and typos are probably fixed.
Reviewed By: jberdine
Differential Revision: D16803281
fbshipit-source-id: 48ba2523b
Summary:
Made progress on the sanity checking lemma (that the step relation
preserves some simple invariants on the state). Proved the Ret
instruction case of the state invariant lemma. To do this, I fixed a few
bugs in the definition, and strengthened the invariants.
Reviewed By: jberdine
Differential Revision: D16786900
fbshipit-source-id: 6fa8cb170
Summary:
Global variables need allocating and initialising before the machine can
start. The definition here shouldn't constrain how and where they are
allocated. For example, they don't all need to have separate
allocations. We also tag allocated blocks so that the allocation for a
global can never be deallocated.
Start working on a sanity checking invariant on states.
Reviewed By: jberdine
Differential Revision: D16735068
fbshipit-source-id: 0d5e60e7a
Summary:
Start working on a simple model of LLVM with the ultimate goal of
handling relevant and/or tricky aspects of LLVM and LLAIR and then
formalising the translation from LLVM to LLAIR.
This is a complete initial model of everything that we are interested in
except for exceptions, which should be tricky. Also no thought has gone
into the treatment of poison and the undefined value, so the treatment
is naive, which is at least partially justified because we are
interested in the semantics of LLVM IR after the optimisation passes
have run.
Include some sanity checking theorems.
Reviewed By: jberdine
Differential Revision: D16731885
fbshipit-source-id: fd53949fe