You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3.2 KiB

This directory contains a model of the semantics of LLVM, of LLAIR, and of the LLVM to LLAIR translation, along with some theorems about them. It is written in the HOL4 theorem prover.

Building

To build the formalisation, which just gets HOL4 to check the proofs, run the Holmake command in this directory. This requires having HOL4 installed, which itself depends on PolyML.

The directions for installing HOL4 are here https://hol-theorem-prover.org/#get, but essentially it is

git clone https://github.com/HOL-Theorem-Prover/HOL.git
cd HOL
poly < tools/smart-configure.sml
bin/build

PolyML is available here https://www.polyml.org/download.html, but it is widely available from various package managers and easiest to install that way, e.g., on MacOS brew install polyml.

This has been tested on PolyML 5.8, and on HOL4 Github commit e2d0fafdb3ccdd098d2bd6e0aaaacbc9468ae4b1.

What is modelled

The model of the translation is not complete: function calls and exceptions are two notable omissions. However, it covers the other important aspects of the translation:

  • the SSA restrictions on LLVM,

  • building compound expressions to reduce the number of temporary stores in LLAIR,

  • and how the SSA restrictions imply that a simple linear traversal of the (topologically sorted) CFG suffices for the translation,

  • the treatment of LLVM's bitwise 2s complement arithmetic on LLAIR's mathematical integers.

The semantics of LLVM and LLAIR are given as small step abstract state machines, and the translation between them as a pure recursive function.

The semantics of LLVM uses a concrete memory model, i.e., indices into an array of bytes. This means that it does not account for all possible forms of undefined behaviour that could occur with LLVM's more complicated pointer provenance model.

The proof that if an LLVM program doesn't get stuck and has a certain observable behaviour, then the translation to LLAIR also doesn't get stuck and has the same behaviour is complete, except for a few minor cheats. The proof in the other direction, from LLAIR behaviour to LLVM behaviour, is not attempted.

Contents

llairScript.sml : the syntax and semantics of LLAIR

llair_propScript.sml : theorems about various properties of LLAIR's semantics

llvmScript.sml : the syntax and semantics of LLVM

llvm_propScript.sml : theorems about various properties of LLVM's semantics

llvm_ssaScript.sml : the definition of SSA form for LLVM, along with theorems about SSA, dominance, and variable liveness

llvm_to_llairScript.sml : the translation function's definition

llvm_to_llair_propScript.sml : theorems about the LLVM to LLAIR translation that don't mention the semantics, so just those that establish invariants about the translation functions themselves.

llvm_to_llair_sem_propScript.sml : the overall correctness theorems for the translation

memory_modelScript.sml : an abstract model of memory as an array of bytes. It includes reading and writing multi-byte words in little-endian encoding

miscScript.sml : various generic theorems that could in principle be moved into HOL4's libraries

settingsScript.sml : set up HOL4, replacing many of it's ugly UPPERCASE constants with lower case ones