You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

324 lines
13 KiB

---
id: separation-logic-and-bi-abduction
title: Separation logic and bi-abduction
---
export const Math = ({ code }) => ( <img
src={`https://math.now.sh?from=${encodeURIComponent(code)}&color=mediumslateblue`}
style={{ height: '100%', verticalAlign: "middle" }} /> );
- [Separation logic](separation-logic-and-bi-abduction#separation-logic)
- [Bi-abduction](separation-logic-and-bi-abduction#bi-abduction)
- [Technical papers](separation-logic-and-bi-abduction#technical-papers)
## Separation logic
Separation logic is a novel kind of mathematical logic which facilitates
reasoning about mutations to computer memory. It enables scalability by breaking
reasoning into chunks corresponding to local operations on memory, and then
composing the reasoning chunks together.
Separation logic is based on a logical connective <Math code="\\( * \\)" />
called the _separating conjunction_ and pronounced "and separately". Separation
logic formulae are interpreted over program allocated heaps. The logical formula
<Math code="\\( A*B \\)" /> holds of a piece of program heap (a heaplet) when it
can be divided into two sub-heaplets described by <Math code="\\(A\\)" /> and
<Math code="\\(B\\)" />. For example, the formula
---
<Math code="\\(x \mapsto y * y \mapsto x \\)" />
---
can be read "<Math code="\\(x\\)" /> points to <Math code="\\(y\\)" /> and
separately <Math code="\\(y\\)" /> points to <Math code="\\(x\\)" />". This
formula describes precisely two allocated memory cells. The first cell is
allocated at the address denoted by the pointer <Math code="\\(x\\)" /> and the
content of this cell is the value of <Math code="\\(y\\)" />. The second cell is
allocated at the address denoted by the pointer <Math code="\\(y\\)" /> and the
content of this second cell is the value of <Math code="\\(x\\)" />. Crucially,
we know that there are precisely two cells because <Math code="\\( * \\)" />
stipulates that they are separated and therefore the cells are allocated in two
different parts of memory. In other words, <Math code="\\( * \\)" /> says that
<Math code="\\(x\\)" /> and <Math code="\\(y\\)" /> do not hold the same value
(i.e., these pointers are not aliased). The heaplet partitioning defined by the
formula above can visualized like so:
![](/img/SepSplit.jpg)
The important thing about separating conjunction is the way that it fits
together with mutation to computer memory; reasoning about program commands
tends to work by updating <Math code="\\(*\\)" />-conjuncts in-place, mimicking
the operational in-place update of RAM.
Separation logic uses Hoare triples of the form
<Math code="\\( \lbrace pre \rbrace prog \lbrace post \rbrace \\)" /> where
<Math code="\\(pre\\)" /> is the precondition, <Math code="\\(prog\\)" /> a
program part, and <Math code="\\(post\\)" /> the postcondition. Triples are
abstract specifications of the behavior of the program. For example, we could
take
---
<Math code="\\( \lbrace r \mapsto open\rbrace \, closeResource(r)\, \lbrace r \mapsto closed\rbrace \;\;\; (spec)\\)" />
---
as a specification for a method which closes a resource given to it as a
parameter.
Now, suppose we have two resources <Math code="\\( r\_1 \\)" /> and
<Math code="\\( r\_2 \\)" />, described by
<Math code="\\(r\_1 \mapsto open * r\_2 \mapsto open\\)" /> and we close the
first of them. We think operationally in terms of updating the memory in place,
leaving <Math code="\\(r\_2 \mapsto open\\)" /> alone, as described by this triple:
---
<Math code="\\( \lbrace r\_1 \mapsto open * r\_2 \mapsto open\rbrace closeResource(r\_1) \lbrace r\_1 \mapsto closed * r\_2 \mapsto open \rbrace \;\;\; (use)\\)" />
---
What we have here is the that specification (spec) described how
<Math code="\\(closeResource()\\)" /> works by mentioning only one piece of
state, what is sometimes called a small specification, and in (use) we use that
specification to update a larger precondition in place.
This is an instance of a general pattern. There is a rule that lets you go from
smaller to bigger specifications
---
<Math code="\\( \frac{\lbrace pre \rbrace prog \lbrace post \rbrace}{\lbrace pre * frame \rbrace prog \lbrace post * frame \rbrace}\\)" />
---
Our passage from (spec) to (use) is obtained by taking
- <Math code="\\(pre\\)" /> to be <Math code="\\(r\_1 \mapsto open\\)" />
- <Math code="\\(post\\)" /> to be <Math code="\\(r\_1 \mapsto closed \\)" />,
and
- <Math code="\\(frame\\)" /> to be <Math code="\\(r\_2 \mapsto open \\)" />
This rule is called the _frame rule_ of separation logic. It is named after the
frame problem, a classic problem in artificial intelligence. Generally, the
<Math code="\\(frame\\)" /> describes state that remains unchanged; the
terminology comes from the analogy of a background scene in an animation as
unchanging while the objects and characters within the scene change.
The frame rule is the key to the principle of local reasoning in separation
logic: reasoning and specifications should concentrate on the resources that a
program accesses (the footprint), without mentioning what doesn't change.
## Bi-abduction
Bi-abduction is a form of logical inference for separation logic which automates
the key ideas about local reasoning.
Usually, logic works with validity or entailment statements like
---
<Math code="\\(A \vdash B\\)" />
---
which says that <Math code="\\(A\\)" /> implies <Math code="\\(B\\)" />. Infer
uses an extension of this inference question in an internal theorem prover while
it runs over program statements. Infer's question
---
<Math code="\\(A * ?antiframe \vdash B * ?frame\\)" />
---
is called _bi-abduction_. The problem here is for the theorem prover to <i>
discover </i> a pair of frame and antiframe formulae that make the entailment
statement valid.
Global analyses of large programs are normally computational untractable.
However, bi-abduction allows to break the large analysis of a large program in
small independent analyses of its procedures. This gives Infer the ability to
scale independently of the size of the analyzed code. Moreover, by breaking the
analysis in small independent parts, when the full program is analyzed again
because of a code change the analysis results of the unchanged part of the code
can be reused and only the code change needs to be re-analyzed. This process is
called incremental analysis and it is very powerful when integrating a static
analysis tool like infer in a development environment.
In order to be able to decompose a global analysis in small independent
analyses, let's first consider how a function call is analyzed in separation
logic. Assume we have the following spec for a function
<Math code="\\( f() \\)" />:
---
<Math code="\\( \lbrace pre\_f \rbrace \;\; f() \;\; \lbrace post\_f \rbrace \\)" />
---
and by analyzing the caller function, we compute that before the call of
<Math code="\\( f \\)" />, the formula <Math code="\\( CallingState \\)" />
hold. Then to utilize the specification of <Math code="\\( f \\)" /> the
following implication must holds:
---
<Math code="\\( CallingState \vdash pre\_f \;\;\;\;\;\;\;\;\;\;\;\; (Function Call)\\)" />
---
Given that, bi-abduction is used at procedure call sites for two reasons: to
discover missing state that is needed for the above implication to hold and
allow the analysis to proceed (the antiframe) as well as state that the
procedure leaves unchanged (the frame).
To see how this works suppose we have some bare code
---
<Math code="\\(closeResource(r1); \, closeResource(r2)\\)" />
---
but no overall specification; we are going to describe how to discover a
pre/post spec for it. Considering the first statement and the (spec) above, the
human might say: if only we had <Math code="\\(r1 \mapsto open\\)" /> in the
precondition then we could proceed. Technically, we ask a bi-abduction question
---
<Math code="\\(emp * ?antiframe \vdash r1 \mapsto open * ?frame\\)" />
---
and we can fill this in easily by picking
<Math code="\\(antiframe = r1 \mapsto open\\)" /> and
<Math code="\\(frame = emp\\)" />, where emp means the empty state. The emp is
recording that at the start we presume nothing. So we obtain the trivially true
implication:
---
<Math code="\\(emp * r1 \mapsto open \vdash r1 \mapsto open * emp\\)" />
---
which, by applying logical rules, can be re-written equivalently to:
---
<Math code="\\(r1 \mapsto open \vdash r1 \mapsto open\\)" />
---
Notice that this satisfy the (Function Call) requirement to correctly make the
call. So let's add that information in the pre, and while we are at it record
the information in the post of the first statement that comes from (spec).
---
<Math code="\\( \lbrace r1 \mapsto open \rbrace \\)" />
<Math code="\\( closeResource(r1) \\)" />
<Math code="\\( \lbrace r1 \mapsto closed \rbrace \\)" />
<Math code="\\( closeResource(r2) \\)" />
---
Now, let's move to the second statement. Its precondition
<Math code="\\(r1 \mapsto closed\\)" /> in the partial symbolic execution trace
just given does not have the information needed by
<Math code="\\(closeResource(r2)\\)" />, so we can fill that in and continue by
putting <Math code="\\(r2 \mapsto open\\)" /> in the pre. While we are at it we
can thread this assertion back to the beginning.
---
<Math code="\\( \lbrace r1 \mapsto open * r2 \mapsto open \rbrace \\)" />
<Math code="\\( closeResource(r1) \\)" />
<Math code="\\( \lbrace r1 \mapsto closed * r2 \mapsto open\rbrace \\)" />
<Math code="\\( closeResource(r2) \\)" />
---
This information on what to thread backwards can be obtained as the antiframe
part of the bi-abduction question
---
<Math code="\\(r1 \mapsto closed * ?antiframe \vdash r2 \mapsto open * ?frame\\)" />
---
where the solution picks
<Math code="\\(antiframe = r2 \mapsto open\\) and \\(frame = r1 \mapsto closed\\)" />.
Note that the antiframe is precisely the information missing from the
precondition in order for <Math code="\\(closeResource(r2)\\)" /> to proceed. On
the other hand, the frame <Math code="\\(r1 \mapsto closed\\)" /> is the portion
of state not changed by <Math code="\\(closeResource(r2)\\)" />; we can thread
that through to the overall postconditon (as justified by the frame rule),
giving us
---
<Math code="\\( \lbrace r1 \mapsto open * r2 \mapsto open \rbrace \\)" />
<Math code="\\( closeResource(r1) \\)" />
<Math code="\\( \lbrace r1 \mapsto closed * r2 \mapsto open\rbrace \\)" />
<Math code="\\( closeResource(r2) \\)" />
<Math code="\\( \lbrace r1 \mapsto closed * r2 \mapsto closed \rbrace\\)" />
---
Thus, we have obtained a pre and post for this code by symbolically executing
it, using bi-abduction to discover preconditions (abduction of antiframes) as
well as untouched portions of memory (frames) as we go along.
In general, bi-abduction provides a way to infer a pre/post specs from bare
code, as long as we know specs for the primitives at the base level of the code.
The human does not need to write preconditions and postconditions for all the
procedures, which is the key to having a high level of automation. This is the
basis for how Infer works, why it can scale, and how it can analyze code changes
incrementally.
Context: The logical terminology we have been using here comes from AI and
philosophy of science. Abductive inference was introduced by the philosopher
Charles Peirce, and described as the mechanism underpinning hypothesis formation
(or, guessing what might be true about the world), the most creative part of the
scientific process. Abduction and the frame problem have both attracted
significant attention in AI. Infer uses an automated form of abduction to
generate preconditions describing the memory that a program touches (the
antiframe part above), and frame inference to discover what isn't touched. Infer
then uses deductive reasoning to calculate a formula describing the effect of a
program, starting from the preconditions. In a sense, Infer approaches automated
reasoning about programs by mimicking what a human might do when trying to
understand a program: it abduces what the program needs, and deduces conclusions
of that. It is when the reasoning goes wrong that Infer reports a potential bug.
This description is by necessity simplified compared to what Infer actually
does. More technical information can be found in the following papers. The
descriptions in the papers are precise, but still simplified; there are many
engineering decisions not recorded there. Finally, beyond the papers, you can
read the source code if you wish!
## Technical papers
The following papers contain some of the technical background on Infer and
information on how it is used inside Facebook.
- <a href="http://link.springer.com/chapter/10.1007%2F3-540-44802-0_1">Local
Reasoning about Programs that Alter Data Structures.</a> An early separation
logic paper which advanced ideas about local reasoning and the frame rule.
- <a href="http://link.springer.com/chapter/10.1007/11804192_6">Smallfoot:
Modular Automatic Assertion Checking with Separation Logic.</a> First
separation logic verification tool, introduced frame inference
- <a href="http://link.springer.com/chapter/10.1007%2F11691372_19">A Local Shape
Analysis Based on Separation Logic.</a> Separation logic meets abstract
interpretation; calculating loop invariants via a fixed-point computation.
- <a href="http://dl.acm.org/citation.cfm?id=2049700">Compositional Shape
Analysis by Means of Bi-Abduction.</a> The bi-abduction paper.
- <a href="https://research.facebook.com/publications/moving-fast-with-software-verification/">Moving
Fast with Software Verification.</a> A paper about the way we use Infer at
Facebook.