Scott Owens
d30b0959a1
|
5 years ago | |
---|---|---|
.. | ||
.gitignore | 5 years ago | |
Holmakefile | 5 years ago | |
README.md | 5 years ago | |
llairScript.sml | 5 years ago | |
llair_propScript.sml | 5 years ago | |
llvmScript.sml | 5 years ago | |
llvm_propScript.sml | 5 years ago | |
llvm_ssaScript.sml | 5 years ago | |
llvm_to_llairScript.sml | 5 years ago | |
llvm_to_llair_propScript.sml | 5 years ago | |
llvm_to_llair_sem_propScript.sml | 5 years ago | |
memory_modelScript.sml | 5 years ago | |
miscScript.sml | 5 years ago | |
settingsScript.sml | 5 years ago |
README.md
This directory contains a model of the semantics of LLVM, of LLAIR, and of the LLVM to LLAIR translation, along with some theorems about them. It is written in the HOL4 theorem prover.
Building
To build the formalisation, which just gets HOL4 to check the proofs, run the
Holmake
command in this directory. This requires having HOL4 installed, which
itself depends on PolyML.
The directions for installing HOL4 are here https://hol-theorem-prover.org/#get, but essentially it is
git clone https://github.com/HOL-Theorem-Prover/HOL.git
cd HOL
poly < tools/smart-configure.sml
bin/build
PolyML is available here https://www.polyml.org/download.html, but it is
widely available from various package managers and easiest to install that way,
e.g., on MacOS brew install polyml
.
This has been tested on PolyML 5.8, and on HOL4 Github commit e2d0fafdb3ccdd098d2bd6e0aaaacbc9468ae4b1.
What is modelled
The model of the translation is not complete: function calls and exceptions are two notable omissions. However, it covers the other important aspects of the translation:
-
the SSA restrictions on LLVM,
-
building compound expressions to reduce the number of temporary stores in LLAIR,
-
and how the SSA restrictions imply that a simple linear traversal of the (topologically sorted) CFG suffices for the translation,
-
the treatment of LLVM's bitwise 2s complement arithmetic on LLAIR's mathematical integers.
The semantics of LLVM and LLAIR are given as small step abstract state machines, and the translation between them as a pure recursive function.
The semantics of LLVM uses a concrete memory model, i.e., indices into an array of bytes. This means that it does not account for all possible forms of undefined behaviour that could occur with LLVM's more complicated pointer provenance model.
The proof that if an LLVM program doesn't get stuck and has a certain observable behaviour, then the translation to LLAIR also doesn't get stuck and has the same behaviour is complete, except for a few minor cheats. The proof in the other direction, from LLAIR behaviour to LLVM behaviour, is not attempted.
Contents
llairScript.sml
: the syntax and semantics of LLAIR
llair_propScript.sml
: theorems about various properties of LLAIR's semantics
llvmScript.sml
: the syntax and semantics of LLVM
llvm_propScript.sml
: theorems about various properties of LLVM's semantics
llvm_ssaScript.sml
: the definition of SSA form for LLVM, along with theorems
about SSA, dominance, and variable liveness
llvm_to_llairScript.sml
: the translation function's definition
llvm_to_llair_propScript.sml
: theorems about the LLVM to LLAIR translation
that don't mention the semantics, so just those that establish invariants about
the translation functions themselves.
llvm_to_llair_sem_propScript.sml
: the overall correctness theorems for the
translation
memory_modelScript.sml
: an abstract model of memory as an array of bytes. It
includes reading and writing multi-byte words in little-endian encoding
miscScript.sml
: various generic theorems that could in principle be moved
into HOL4's libraries
settingsScript.sml
: set up HOL4, replacing many of it's ugly UPPERCASE
constants with lower case ones