This directory contains a model of the semantics of LLVM, of LLAIR, and of the LLVM to LLAIR translation, along with some theorems about them. It is written in the HOL4 theorem prover. # Building To build the formalisation, which just gets HOL4 to check the proofs, run the `Holmake` command in this directory. This requires having HOL4 installed, which itself depends on PolyML. The directions for installing HOL4 are here , but essentially it is git clone https://github.com/HOL-Theorem-Prover/HOL.git cd HOL poly < tools/smart-configure.sml bin/build PolyML is available here , but it is widely available from various package managers and easiest to install that way, e.g., on MacOS `brew install polyml`. This has been tested on PolyML 5.8, and on HOL4 Github commit e2d0fafdb3ccdd098d2bd6e0aaaacbc9468ae4b1. # What is modelled The model of the translation is not complete: function calls and exceptions are two notable omissions. However, it covers the other important aspects of the translation: - the SSA restrictions on LLVM, - building compound expressions to reduce the number of temporary stores in LLAIR, - and how the SSA restrictions imply that a simple linear traversal of the (topologically sorted) CFG suffices for the translation, - the treatment of LLVM's bitwise 2s complement arithmetic on LLAIR's mathematical integers. The semantics of LLVM and LLAIR are given as small step abstract state machines, and the translation between them as a pure recursive function. The semantics of LLVM uses a concrete memory model, i.e., indices into an array of bytes. This means that it does not account for all possible forms of undefined behaviour that could occur with LLVM's more complicated pointer provenance model. The proof that if an LLVM program doesn't get stuck and has a certain observable behaviour, then the translation to LLAIR also doesn't get stuck and has the same behaviour is complete, except for a few minor cheats. The proof in the other direction, from LLAIR behaviour to LLVM behaviour, is not attempted. # Contents `llairScript.sml` : the syntax and semantics of LLAIR `llair_propScript.sml` : theorems about various properties of LLAIR's semantics `llvmScript.sml` : the syntax and semantics of LLVM `llvm_propScript.sml` : theorems about various properties of LLVM's semantics `llvm_ssaScript.sml` : the definition of SSA form for LLVM, along with theorems about SSA, dominance, and variable liveness `llvm_to_llairScript.sml` : the translation function's definition `llvm_to_llair_propScript.sml` : theorems about the LLVM to LLAIR translation that don't mention the semantics, so just those that establish invariants about the translation functions themselves. `llvm_to_llair_sem_propScript.sml` : the overall correctness theorems for the translation `memory_modelScript.sml` : an abstract model of memory as an array of bytes. It includes reading and writing multi-byte words in little-endian encoding `miscScript.sml` : various generic theorems that could in principle be moved into HOL4's libraries `settingsScript.sml` : set up HOL4, replacing many of it's ugly UPPERCASE constants with lower case ones