infer_clone

Commit Graph

Author	SHA1	Message	Date
Scott Owens	5b7931e71a	[sledge sem] Add a rudimentary theory of SSA Summary: Since the correcteness of the mapping from LLVM to llair depends on LLVM being SSA, we need to formalise what that means. We also prove that the domination relation is a strict partial order, which will probably be helpful when reasoning about the translation. Reviewed By: jberdine Differential Revision: D17631456 fbshipit-source-id: a00eb3f87	5 years ago
Scott Owens	71aa4816d6	[sledge sem] Fix the semantics and trans. of If Summary: The LLVM semantics and translation was not consistently treating the 1-bit word value condition as signed or unsigned. Reviewed By: jberdine Differential Revision: D17605766 fbshipit-source-id: 77edf63b7	5 years ago
Scott Owens	ab7233c5b8	[sledge sem] Refactor the way LLVM sem. does phis Summary: Previously the LLVM semantics did the phi instructions at the head of a block as part of executing the branch into that block. This looked a bit weird, but had the advantage that the semantics knew which block was being jumped from, which is necessary to run the phi instructions. However, it meant that the rules for doing phi instructions would need to show up with each branching construct. It was also annoying for the LLVM->llair proof, since the phis are removed and their effect happens as a distinct step from the branch. Here we add a distinct Phi_ip instruction pointer to indicate that the phi instructions at the start of the block should execute next, and then be incremented to the usual numeric instruction pointer that points to the non-phi instructions. The Phi_ip contains the identity of the previous block. Reviewed By: jberdine Differential Revision: D17452416 fbshipit-source-id: 78fef7cca	5 years ago
Scott Owens	17b3c7a49f	[sledge sem] Add top-level llair semantics Summary: Give the llair semantics observable side effects (writes to global variables) and a semantic function mirroring the LLVM semantics. Start sketching out the LLVM/llair translation equivalence proof in a top-down way from the obvious statement of equality of the semantics. Reviewed By: jberdine Differential Revision: D17399654 fbshipit-source-id: 2170678a8	5 years ago
Scott Owens	30c301a3e8	[sledge sem] Add a more llair-like LLVM semantics Summary: The simple LLVM semantics steps one instruction at a time, but the generated llair does whole blocks at a time, since many individual LLVM instructions can become a single llair expression. We add a bigger-step LLVM semantics that does whole blocks at a time (except that it also stops at function calls, since those end blocks in llair). The steps in this bigger-step semantics should be at the same granularity as the llair steps, making it easier to verify the translation. We add a notion of observation to the LLVM semantics (right now, just global variable writes) and use that to define two top-level semantic functions, which we prove to be equivalent. Reviewed By: jberdine Differential Revision: D17396016 fbshipit-source-id: ee632fb92	5 years ago
Benno Stein	7ec2830d92	[sledge] Only merge worklist states that share a calling context Summary: This diff allows domains to specify which abstract states can or can't be merged together by the worklist. In particular, this is needed for relational domains to ensure that Hoare triples are joined only when they share a precondition. Reviewed By: jberdine Differential Revision: D17571148 fbshipit-source-id: d9345fdc9	5 years ago
Benno Stein	e44827b892	[sledge] Add option to apply used-globals as pre-analysis Summary: This diff adds a "-prenalyze-globals" flag to all analyze targets which, when set, computes used-globals sets for all reachable functions and then uses that information to track only relevant global variables at calls in the main analysis. Reviewed By: jberdine, jvillard Differential Revision: D17526746 fbshipit-source-id: 1a114285c	5 years ago
Benno Stein	1ab8359bc0	[sledge] fix bug spuriously marking a register as global variable Summary: Fixes a bug in Llair.Frontend.xlate_value where the l-val register of LLVM instruction calls was being marked as global. Reviewed By: jberdine Differential Revision: D17570458 fbshipit-source-id: e1b5924e2	5 years ago
Benno Stein	637fff5247	[sledge] Check for intrinsic calls in used-globals analysis Summary: Fixes a bug where are all calls are treated as intrinsics in used globals analysis, since exec_intrinsic is invoked at _all_ calls to determine which are intrinsic, not only at call sites known to target intrinsics. Reviewed By: jberdine Differential Revision: D17499406 fbshipit-source-id: 41f7621f2	5 years ago
Benno Stein	6592eb609f	[sledge] Add option to skip recursive calls at depth bound Summary: While the symbolic heap analysis ends its search upon hitting the bound on recursion depth, the used-globals analysis should instead simply skip recursive calls beyond the depth. Note that this is unsound for arbitrary abstract domains, however, and the flag controlling this feature should be used with caution. Note that procedure calls are still not handled correctly, since Used_globals.exec_intrinsic does not properly check whether callees are intrinsic. A forthcoming commit will fix that, as well. Reviewed By: jberdine Differential Revision: D17479753 fbshipit-source-id: aa92e0ef3	5 years ago
Benno Stein	00a5d3dd64	[sledge] Account for callees in used-globals analysis Summary: Include global variables used in function callees in used globals analysis. Also adds support for arbitrary changes to symbolic state while resolving callees in other analyses. Reviewed By: jberdine Differential Revision: D17479352 fbshipit-source-id: e3cd9f179	5 years ago
Josh Berdine	c131e2e669	[sledge] Use dune's Build_info for version reporting Summary: Replace custom version reporting support using a shell script with code using dune's Build_info API. Note that after this diff, the executables under _build/<context> are not version-stamped, but those under _build/_install are. The symlinks in bin point to the latter, stamped, exes. Reviewed By: bennostein Differential Revision: D16985446 fbshipit-source-id: 7afac87be	5 years ago
Benno Stein	47f314c00e	[sledge] Add used-globals abstract domain and transfer functions Summary: Adds an abstract domain to track global variable usages, as well as supporting changes to the frontend, IR and CLI. This analysis will support optimizations to the main symbolic-heap analysis, but for now can be invoked independently through the `-domain` flag on `analyze` targets of the Sledge executable. Reviewed By: jberdine Differential Revision: D17422212 fbshipit-source-id: 74bed0a76	5 years ago
Benno Stein	3dc0c5938f	[sledge] Extract relational logic from Sh_domain, create "domain" module Summary: Generalize the lifting from State_domain (i.e. symbolic heaps) to Sh_domain (i.e. relations over symbolic heaps). Also, extract abstract-domain-related code into its own module/directory. Reviewed By: jberdine Differential Revision: D17319007 fbshipit-source-id: cefbd1393	5 years ago
Benno Stein	2acb1c3dee	[sledge] Functorize worklist, separate out domain-specific logic Summary: Add support for future development of new abstract domains by eliminating hard-wired dependencies from the worklist into the symbolic heap domain. Also includes an implementation of a trivial unit domain and a CLI flag to enable its use, for debugging purposes. Reviewed By: jberdine Differential Revision: D17281681 fbshipit-source-id: 5858fd420	5 years ago
Scott Owens	f298d728c5	[sledge sem] Start sketching translation correctness Summary: This includes a few changes and corrections to the semantics, to support the translation. This initial attempt to reason about LLVM -> llair showed three things that needed repair in the semantics, in addition to various bugs. We address them as follows. Refactor llair semantics to have only a single kind of flat value: integers that fit into specified bit widths. Operations on size values (e.g., offsets, indices and the like) can just take an integer and ignore its number of bits. Pointers can just be considered integers that fit into a certain size given by the constant pointer_size. Later on we can consider making this a parameter to the model. Change the generic memory model interface to use numbers rather than words as the generic encoding of a large value. This makes it more useful for llair where words are not used. Pay more careful attention to signed/unsigned issues. Neither LLVM nor llair have a concept of signed vs unsigned value. Instead individual operations interpret bit patterns in various ways, some of which are ambiguous in the LLVM manual. For example, since getelementpointer's indices are explicitly said to be interpreted as signed 2's complement, we should probably do the same for insertvalue and extractvalue. However it is not clear how the argument to alloca is to be interpreted. For now we assume signed. Reviewed By: jberdine Differential Revision: D17164133 fbshipit-source-id: 31a8af635	5 years ago
Josh Berdine	72946c3be3	[sledge] Update dependencies Reviewed By: jvillard Differential Revision: D17132472 fbshipit-source-id: 9f4c9421e	5 years ago
Scott Owens	d864fb2c89	[sledge semantics] Add a rough draft llair semantics Summary: Not everything is here yet, and there is some confusion on what to do about the size values. However, the semantics has the right general shape and will be a nice starting point for thinking about the details. Reviewed By: jberdine Differential Revision: D17111041 fbshipit-source-id: cc75651c6	5 years ago
Scott Owens	32983e129b	[sledge semantics] Update expr transl. for cross-block Summary: The translation from LLVM to llair now builds expressions up across blocks, following the implementation. This is easy to do because of the dominance restrictions in SSA, but might be difficult to reason about. Reviewed By: jberdine Differential Revision: D17111040 fbshipit-source-id: a8e99147d	5 years ago
Scott Owens	9f44bbc264	[sledge semantics] Refactor the memory model Summary: LLVM and llair have similar memory models, and we don't want to duplicate any definitions or theorems. This adds a new memory model theory which should be understandable in its own right. A heap is a mapping from addresses to bytes, alongside a set of valid addresses, and intervals that have been allocated already. Primitives are defined for allocating and de-allocating as well as reading and writing chuncks of bytes. There is also a generic type of structured values, and functions for converting them to/from byte arrays. Reviewed By: jberdine Differential Revision: D17074470 fbshipit-source-id: bdab6089f	5 years ago
Josh Berdine	13fb57ec62	[sledge] Revise llvm to llair translation to avoid code duplication Summary: In some cases inlining pure expressions into their use sites causes code blowup. This diff changes the frontend to inline expressions only if there is a single use, and otherwise adds a move instruction. Reviewed By: ngorogiannis Differential Revision: D17071770 fbshipit-source-id: d866a0622	5 years ago
Josh Berdine	ed4aac4f66	[sledge] Update stale comment Summary: This has been out of date since arithmetic was changed from a purely uninterpreted treatment to having a solver. Reviewed By: jvillard Differential Revision: D16985159 fbshipit-source-id: 39e42069c	5 years ago
Josh Berdine	0667edf418	[sledge] Remove unused Llair.ignore_result Summary: No longer needed due to blocks not taking parameters. Reviewed By: jvillard Differential Revision: D16914858 fbshipit-source-id: 24b1106ac	5 years ago
Josh Berdine	3f8d5ace6e	[sledge] Eliminate SSA Summary: While SSA can be useful for code transformation purposes, it offers little for semantic static analyses. Essentially, such analyses explore the dynamic semantics of code, and the static single assignment property does not buy much. For example, once an execution visits a loop body that assigns a variable, there are multiple assignments that the analysis must deal with. This leads to the need to treat blocks as if they assign all their local variables, renaming to avoid name clashes a la Floyd's assignment axiom. That is fine, but it makes it much more involved to implement a version that is economical with respect to renaming only when necessary. Additionally the scoping constraints of SSA are cumbersome and significantly complicate interprocedural analysis (where there is a long history of incorrect proof rules for procedures, and SSA pushes the interprocedural analysis away from being able to use known-good ones). So this diff changes Llair from a functional SSA form to a traditional imperative language. Reviewed By: jvillard Differential Revision: D16905898 fbshipit-source-id: 0fd835220	5 years ago
Josh Berdine	b6eab89504	[sledge] Remove dead from_call.actuals_to_formals field Reviewed By: jvillard Differential Revision: D16905894 fbshipit-source-id: ab4b34ba0	5 years ago
Josh Berdine	8d9b8962c7	[sledge] Add Move instruction Reviewed By: jvillard Differential Revision: D16905896 fbshipit-source-id: 3d8b9a88a	5 years ago
Josh Berdine	2c9fce0bf2	[sledge] Add Vector.unzip Reviewed By: jvillard Differential Revision: D16905895 fbshipit-source-id: 98891d4b0	5 years ago
Josh Berdine	0790a64763	[sledge] Change symbolic execution of instructions to not rely on SSA Summary: Before this diff symbolic execution of instructions assumed that assigned variables were unconstrained in the precondition. This is ensured by symbolic execution of control flow, which renames all local variables of a block when it is entered. This diff changes symbolic execution of instructions to rename modified variables that appear in the precondition when necessary, and accounts for the modified variable occurrence condition on the frame rule. This will enable more economically renaming variables, as most of the time it is not needed. Reviewed By: jvillard Differential Revision: D16905893 fbshipit-source-id: 3a53525d7	5 years ago
Scott Owens	808a61623f	Add types to the variable syntax in llair Summary: Each variable now contains its type, alongside its name. This is more uniform than in LLVM, where the name is usually paired with a type, but not always, for example, the register type of the result of an extractvalue is left implicit. Reviewed By: jberdine Differential Revision: D16984630 fbshipit-source-id: 1c3bc4985	5 years ago
Scott Owens	85243ada62	Update for improved HOL syntax for Datatypes Summary: HOL now lets us omit quotations on Datatypes and make them look more like the other new-style HOL definitions. Reviewed By: jberdine Differential Revision: D16983934 fbshipit-source-id: f8ef3abb5	5 years ago
Scott Owens	84883127af	Add a skeleton of an approach to llvm->llair Summary: This sketches out how translation can be approached. It is partially based on the Sledge code. For basic blocks, isn't based on the Sledge code, but just my own thoughts as a starting point. Essentially, we are trying to build up larger expressions, and so not assigning to temporary registers that don't live past the end of the block. This does remove sharing, so a fancier approach could check for multiple uses of end-of-block dead registers, or look at the sizes of expressions. The approach should be flexible enough to accommodate such changes. Fix icmp syntax Using finite maps is elegant in the semantics, but awkward for writing the translation function. Refactor the mappings from labels to functions and from labels to blocks to use association lists instead. To remove phi nodes, the translation takes every edge in the control flow graph and makes a new basic block that contains a single parallel move instruction that corresponds to the action of the phi node of the target block. Reviewed By: jberdine Differential Revision: D16831051 fbshipit-source-id: 005663e26	5 years ago
Scott Owens	6eab69d0d1	Definie a prelim. AST for llair's semantics Summary: The AST is not complete on expressions, but it should have most of the important features. The representation is in some ways very different from the OCaml implementation, because the OCaml code uses mutation to build the CFG as an actual pointer graph in memory, and also because the expression representation is optimised for the backend. For the former, it should be easy to see that the AST here is isomorphic, representing the CFG with finite maps from block labels. The correspondence is less clear in the latter case, but the point here is not to model or verify implementation optimisations, but to give a semantics to llair as a language. Reviewed By: jberdine Differential Revision: D16807132 fbshipit-source-id: b0f64b3ec	5 years ago
Josh Berdine	7efc9285cb	[sledge] Fix type of Exp.rename Reviewed By: ngorogiannis Differential Revision: D16905897 fbshipit-source-id: 2f6740b52	5 years ago
Josh Berdine	0895246e4f	[sledge] Remove label on ~opts args in Control Reviewed By: ngorogiannis Differential Revision: D16905899 fbshipit-source-id: 205df2489	5 years ago
Scott Owens	742ab9089d	Change a type name Summary: Change loc_var (for local variable) to reg (for register) because loc_var looks too much like a location tagged variable. Reviewed By: jberdine Differential Revision: D16827920 fbshipit-source-id: 5b11f1065	5 years ago
Scott Owens	a635aff1bc	Finish proving sanity checking property Summary: There could very well still be bugs in the semantics, since the invariant here doesn't say all that much, and it completely ignores local registers. But most trivial things and typos are probably fixed. Reviewed By: jberdine Differential Revision: D16803281 fbshipit-source-id: 48ba2523b	5 years ago
Scott Owens	89c3da4510	Prove that Ret preserves the invariant Summary: Made progress on the sanity checking lemma (that the step relation preserves some simple invariants on the state). Proved the Ret instruction case of the state invariant lemma. To do this, I fixed a few bugs in the definition, and strengthened the invariants. Reviewed By: jberdine Differential Revision: D16786900 fbshipit-source-id: 6fa8cb170	5 years ago
Scott Owens	df5f20956f	Define a simple initial state that inits the globals Summary: Global variables need allocating and initialising before the machine can start. The definition here shouldn't constrain how and where they are allocated. For example, they don't all need to have separate allocations. We also tag allocated blocks so that the allocation for a global can never be deallocated. Start working on a sanity checking invariant on states. Reviewed By: jberdine Differential Revision: D16735068 fbshipit-source-id: 0d5e60e7a	5 years ago
Scott Owens	97eb280cb5	Add initial mini-LLVM semantics written in HOL4 Summary: Start working on a simple model of LLVM with the ultimate goal of handling relevant and/or tricky aspects of LLVM and LLAIR and then formalising the translation from LLVM to LLAIR. This is a complete initial model of everything that we are interested in except for exceptions, which should be tricky. Also no thought has gone into the treatment of poison and the undefined value, so the treatment is naive, which is at least partially justified because we are interested in the semantics of LLVM IR after the optimisation passes have run. Include some sanity checking theorems. Reviewed By: jberdine Differential Revision: D16731885 fbshipit-source-id: fd53949fe	5 years ago
Timotej Kapus	afb6a4fd11	[sledge] Fix internalization Summary: Currently bitcode produced with `sledge buck link` can have missing symbols that are clearly defined in the source. For example consider a symbol `awesome_function` that is defined in the libraries linked in but not in the produced binary (despite being reachable from main). `llvm-nm` of the bitcode produced by `llvm-link` might look like: ``` U awesome_function t awesome_function.1892 ``` Some our `awesome_function` is undefined and its definition is called `awsome_function.1892` for some reason and is local. I think this is because symbol get internalized too early and then they get renamed and somehow lost. Not sure why `llvm-link` behaves this way sometimes. This patch removes internalization from `llvm-link` and puts it into `opt`, where it doesn't cause problems. Reviewed By: jvillard Differential Revision: D16494153 fbshipit-source-id: aad9053a4	5 years ago
Timotej Kapus	c8d1da1e0d	[sledge] Fix __llair_alloc Summary: `__llair_alloc` is meant to be a drop-in non-failing replacement for `mallco`. Currently `__llair_alloc(1)` allocates 8 bytes instead of 1 as `malloc(1)` would. This is because handling of `__llair_alloc` was merged with handling of `new`. This patch reverts changes to handling of `new` in D15778817 and adds a new case for `__llair_alloc`. Reviewed By: jvillard Differential Revision: D16356865 fbshipit-source-id: 3878d87c3	5 years ago
Timotej Kapus	6c9e4e52c6	[sledge][summaries] Fix unsoundes due to missing frame Summary: When using summaries we first garbage collect the precondition and then ask the solver to infer the frame of the precondition with respect to grabage-collected footprint. Currently if the solver fails to show the frame, we just give it an empty frame. This is bad, because if grabage collection removed some segments, they don't get added back on. This patch throws an exception instead to be very explicit when the solver cannot show the frame in this case. Reviewed By: ngorogiannis Differential Revision: D16339587 fbshipit-source-id: b88d0689c	5 years ago
Josh Berdine	7f423f7fa1	[sledge] Model `folly::usingJEMalloc()` Summary: The actual implementation of folly::usingJEMalloc() tests if malloc is jemalloc using internal knowledge of the jemalloc implemenation of malloc. This internal behavior is not reflected in the analyzer's spec, so the detection fails. Additionally, folly::usingJEMalloc is implemented using mallctl to query internal state of jemalloc. Depending on the key string passed to mallctl, it might return a pointer to jemalloc internal state, or a scalar, which means that the spec needs to essentially allocate that state in those cases. Since the jemalloc detection fails, and the analyzer is not always able to reason precisely about string equality, this diff models folly::usingJEMalloc directly (as nondet). Reviewed By: kren1 Differential Revision: D16059776 fbshipit-source-id: 7e7156d7d	5 years ago
Josh Berdine	4bbe05698e	[sledge] Remove `.<int>` suffix when looking up modeled function names Summary: It seems that functions internalized by llvm no longer have valid mangled names, and instead have a `.<int>` suffix. This diff removes these unpredictable suffixes when checking if a called function is a specified/modeled intrinsic. Reviewed By: kren1 Differential Revision: D16059781 fbshipit-source-id: a4b9f6c73	5 years ago
Josh Berdine	0126b64d16	[sledge] Explicate output flag of disassemble command Summary: This one was overlooked before Reviewed By: kren1 Differential Revision: D16269729 fbshipit-source-id: 0aa86ca9a	5 years ago
Josh Berdine	9865bc0f74	[sledge] [solver] Strengthen handling of existential subtrahends Summary: A frame inference query `Minuend ⊢ ∃xs. Subtrahend` returns a `∃zs. Remainder` formula such that `Minuend ⊢ ∃xs. Subtrahend * ∃zs. Remainder` when successful. Currently if the subtrahend is itself existentially quantified, its existentials are treated trivially: they must witness themselves. This diff allows the solver to find witnesses as the `xs`. They are still existentially quantified in the remainder, so clients that need to constrain them should still name them before calling the solver. Reviewed By: kren1 Differential Revision: D16269630 fbshipit-source-id: 65136edd1	5 years ago
Timotej Kapus	b5dea36c5e	[sledge] Add global merge pass Summary: Add a global merge pass that merges globals into a single big global. It replaces the uses of globals merged, with offsets into the big global. Function summarisationis a big benefactor of this as it greatly reduces the number of implicit formals (ie. globals). Reviewed By: jvillard Differential Revision: D16260098 fbshipit-source-id: 1b936f02f	5 years ago
Timotej Kapus	5882c49d7d	[sledge] Disable creating of summaries when summaries disabled Summary: Fix a bug where summaries would be created even if summarisation option is disabled. Reviewed By: jvillard Differential Revision: D16259761 fbshipit-source-id: f7319ef03	5 years ago
Timotej Kapus	ba6e6bf369	[sledge] Actually use function summaries Summary: If function summaries are enabled calling a function first tries to apply a summary, if succesful, it directly jumps to the return site of the call. Otherwise it proceeds as before. Reviewed By: jvillard Differential Revision: D16201251 fbshipit-source-id: cec52e0e5	5 years ago
Timotej Kapus	c0c6d65d45	[sledge] Generate and apply summaries Summary: Define a new function summary type and compute it on function return. As an intermediary step also apply the just computed summary to function pre so it can be compared to what was actually computed. Reviewed By: jvillard Differential Revision: D16149833 fbshipit-source-id: b826c17e8	5 years ago

1 2 3 4 5 ...

288 Commits (1468dcc1d90d84293c82f8f7f5bbc8fe6518006c)