infer_clone

Commit Graph

Author	SHA1	Message	Date
Ezgi Çiçek	5a2b285fff	[pulse] Distinguish exit state at top level Summary: This diff lifts the `PulseAbductiveDomain.t` in `PulseExecutionState` by tracking whether the program continues the analysis normally or exits unusually (e.g. by calling `exit` or `throw`): ``` type exec_state = \| ContinueProgram of PulseAbductiveDomain.t (** represents the state at the program point ) \| ExitProgram of PulseAbductiveDomain.t (* represents the state originating at exit/divergence. *) ``` Now, Pulse's actual domain is tracked by `PulseExecutionState` and as soon as we try to analyze an instruction at `ExitProgram`, we simply return its state. The aim is to recover the state at the time of the exit, rather than simply ignoring them (i.e. returning empty disjuncts). This allows us to get rid of some FNs that we were not able to detect before. Moreover, it also allows the impurity analysis to be more precise since we will know how the state changed up to exit. TODO: - Impurity analysis needs to be improved to consider functions that simply exit as impure. - The next goal is to handle error state similarly so that when pulse finds an error, we recover the state at the error location (and potentially continue to analyze?). Disclaimer: currently, we handle throw statements like exit (as was the case before). However, this is not correct. Ideally, control flow from throw nodes follows catch nodes rather than exiting the program entirely. Reviewed By: jvillard Differential Revision: D20791747 fbshipit-source-id: df9e5445a	5 years ago
Ezgi Çiçek	d97e1c8fdb	[pulse][impurity] Add model for System.exit() Summary: - Model `System.exit()` as early_exit and add a test - Tweak message of methods that are impure due to having no pulse summary (and add a test) Reviewed By: skcho Differential Revision: D20668979 fbshipit-source-id: 6b5589aae	5 years ago
Ezgi Çiçek	b64ed0bbf2	[impurity] Consider functions with empty pulse summary as impure Summary: As exemplified by added tests, pulse computes an empty summary (with 0 disjuncts) whenever it discovers a contradiction which might be caused by: - discovering aliasing in memory - widening limited number of times in loops and concluding that loop exit conditions are never taken However, AFAIU, it is not possible to have a function with 0 disjunct apart from such anomalities. Even a function which does nothing like `void foo(){}` has 1 disjuncts: ``` Pulse: 1 pre/post(s) #0: PRE: { roots={ }; mem ={ }; attrs={ };} POST: { roots={ }; mem ={ }; attrs={ };} SKIPPED_CALLS: { } ``` The aim of this diff is to consider functions with 0 disjuncts as impure because most often such cases are impure, rather than actually pure. Reviewed By: skcho Differential Revision: D20619504 fbshipit-source-id: 3a8502c90	5 years ago
Ezgi Çiçek	cc815f5d20	[pulse] Only propagate existing WrittenTo attributes at function calls Summary: Previously, at each function call, we added a `WrittenTo` attribute for applying the address of the actuals. However, this results in mistakenly considering each function application that inspects its argument as impure. Instead, we should only propagate `WrittenTo` if the actuals have already `WrittenTo` attributes. For instance, for the following functions ``` public static boolean is_null(Byte a) { return a == null; } public static boolean call_is_null(Byte a) { return is_null(a); } ``` We used to get the following pulse summary for `call_is_null` (showing only one of the disjuncts): ``` #0: PRE: { roots={ &a=v1 }; mem ={ v1 -> { * -> v2 } }; attrs={ v1 -> { MustBeValid }, v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };} POST: { roots={ &a=v1, &return=v8 }; mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } }; attrs={ v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]), WrittenTo-----------WRONG }, v4 -> { Arith =1, BoItv (1), Invalid ConstantDereference(is the constant 1), WrittenTo-----------WRONG }, v8 -> { WrittenTo } };} SKIPPED_CALLS: { } ``` where we mistakenly recorded a `WrittenTo` for `v2` (what `a` points to). As a result, we considered `call_is_null` as impure :( This diff fixes that since the callee `is_null` doesn't have any `WrittenTo` attributes for its parameter `a`. So, we don't propagate `WrittenTo` and get the following summary ``` #0: PRE: { roots={ &a=v1 }; mem ={ v1 -> { * -> v2 } }; attrs={ v1 -> { MustBeValid }, v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) } };} POST: { roots={ &a=v1, &return=v8 }; mem ={ v1 -> { * -> v2 }, v8 -> { * -> v4 } }; attrs={ v2 -> { Arith =null, BoItv ([max(0, v2), min(0, v2)]) }, v4 -> { Arith =1, BoItv (1), Invalid ConstantDereference(is the constant 1) }, v8 -> { WrittenTo } };} SKIPPED_CALLS: { } ``` Reviewed By: skcho Differential Revision: D20490102 fbshipit-source-id: 253d8ef64	5 years ago
Ezgi Çiçek	a4c3925d9a	[impurity] Track unique accesses Summary: Impurity domain was tracking all changes to variables (with a list of traces that containing all write/invalid accesses). This results in having long traces with multiple access events for the same variable. For instance, ``` void swap_impure(int[] array, int i, int j) { int tmp = array[i]; array[i] = array[j]; \\ included in the trace array[j] = tmp; \\ included in the trace } ``` here we recorded both array accesses. This diff changes the domain to include accesses so that we only keep track of a single trace per access. Array accesses are only recorded once. Note that we want to record all unique accesses, not just the first one, because impurity will be used for hoisting/cost where we will invalidate impure arguments and consider all the rest as not changing. Reviewed By: jvillard Differential Revision: D20385745 fbshipit-source-id: d3647dad3	5 years ago
Ezgi Çiçek	e3c89b1f10	[impurity] Fix include_value_history Summary: D20362149 missed - to pass the optional argument `include_value_history` to the recursive call in `PulseTrace.add_to_errlog`. - to set `include_value_history=false` for skipped calls. This diff fixes these issues. Reviewed By: skcho Differential Revision: D20385604 fbshipit-source-id: 176e4d010	5 years ago
Ezgi Çiçek	b90d7c42d3	[impurity] Do not add value history in impurity traces Summary: Impurity traces are quite big due to recording values histories. Let's simplify the traces by removing pulse's value histories. Reviewed By: skcho Differential Revision: D20362149 fbshipit-source-id: 8a2a6115e	5 years ago
Ezgi Çiçek	a0fd5a0e6a	[pulse] Refactor attributes into domain Summary: Let's move attributes into Pulse's domain. Reviewed By: jvillard Differential Revision: D19533915 fbshipit-source-id: 995fd12da	5 years ago
Ezgi Çiçek	dd59a141f0	[impurity] Rely on set of skipped functions to determine impurity Summary: Currently, impurity analysis is oblivious to skipped functions which might e.g. return a non-deterministic value, write to memory or have some other side-effect. This diff fixes that by relying on Pulse's skipped functions to determine impurity. Any unknown function which is not modeled to be pure is assumed to be impure. This is a heuristic. We could have assumed them to be pure by default as well. Reviewed By: jvillard Differential Revision: D19428514 fbshipit-source-id: 82efe04f9	5 years ago
Jules Villard	df49f318f6	[pulse] havoc formals passed by reference to unknown procedures Summary: This gets rid of false positives when something invalid (eg null) is passed by reference to an initialisation function. Havoc'ing what the contents of the pointer to results in being optimistic about said contents in the future. Also surprisingly gets rid of some FNs (which means it can also introduce FPs) in the `std::atomic` tests because a path condition becomes feasible with havoc'ing. There's a slight refinement possible where we don't havoc pointers to const but that's more involved and left as future work. Reviewed By: skcho Differential Revision: D18726203 fbshipit-source-id: 264b5daeb	5 years ago
Ezgi Çiçek	6781ba36d3	[impurity] Start checking equivalence at materialized addresses in pre Summary: Previously, we considered a function which modifies its parameters to be impure even though it might not be modifying the underlying value. This resulted in FPs like the following program in Java: ``` void fresh_pure(int[] a) { a = new int[1]; } ``` Similarly, in C++, we considered the following program as impure because it was writing to `s`: ``` Simple* reassign_pure(Simple* s) { s = new Simple{2}; return s; } ``` This diff fixes that issue by starting the check for address equivalnce in pre-post not directly from the addresses of the stack variables, but from the addresses pointed to by these stack variables. That means, we only consider things to be impure if the actual values pointed by the parameters change. Reviewed By: skcho Differential Revision: D18113846 fbshipit-source-id: 3d7c712f3	5 years ago
Jules Villard	6a738045fd	[pulse] interprocedural histories and traces Summary: bigmacro_bender There are 3 ways pulse tracks history. This is at least one too many. So far, we have: 1. "histories": a humble list of "events" like "assigned here", "returned from call", ... 2. "interproc actions": a structured nesting of calls with a final "action", eg "f calls g calls h which does blah" 3. "traces", which combine one history with one interproc action This diff gets rid of interproc actions and makes histories include "nested" callee histories too. This allows pulse to track and display how a value got assigned across function calls. Traces are now more powerful and interleave histories and interproc actions. This allows pulse to track how a value is fed into an action, for instance performed in callee, which itself creates some more (potentially now interprocedural) history before going to the next step of the action (either another call or the action itself). This gives much better traces, and some examples are added to showcase this. There are a lot of changes when applying summaries to keep track of histories more accurately than was done before, but also a few simplifications that give additional evidence that this is the right concept. Reviewed By: skcho Differential Revision: D17908942 fbshipit-source-id: 3b62eaf78	5 years ago
Jules Villard	8182514f35	[impurity] clarify string parameter of `ImpurityDomain.add_to_errlog` Summary: Instead of a string argument named `~str` pass `Formal \| Global` and let `add_to_errlog` figure out how to print it. Reviewed By: ezgicicek Differential Revision: D17907657 fbshipit-source-id: ed09aab72	5 years ago
Ezgi Çiçek	557e2bfa3f	[impurity] Consider functions with no pulse summary as impure Summary: If we have no pulse summary (most likely caused by pulse finding a legit issue with the code), let's consider the function as impure. Reviewed By: jvillard Differential Revision: D17906016 fbshipit-source-id: 671d3e0ba	5 years ago
Jules Villard	362e9cc622	[pulse] do not print `()` after functions Summary: Unfortunately it is very hard to predict when `Typ.Procname.describe` will add `()` after the function name, so we cannot make sure it is always there. Right now we report clowny stuff like "error while calling `foo()()`", which this change fixes. Reviewed By: ezgicicek Differential Revision: D17665470 fbshipit-source-id: ef290d9c0	5 years ago
Ezgi Çiçek	c5ca4db8d0	[pulse][impurity] Use pulse for detecting impurity Summary: Introduce a new experimental checker (`--impurity`) that detects impurity information, tracking which parameters and global variables of a function are modified. The checker relies on Pulse to detect how the state changes: it traverses the pre and post pairs starting from the parameter/global variable and finds where the pre and post heaps diverge. At diversion points, we expect to see WrittenTo/Invalid attributes containing a trace of how the address was modified. We use these to construct the trace of impurity. This checker is a complement to the purity checker that exists mainly for Java (and used for cost and loop-hoisting analyses). The aim of this new experimental checker is to rely on Pulse's precise memory treatment and come up with a more precise im(purity) analysis. To distinguish the two checkers, we introduce a new issue type `IMPURE_FUNCTION` that reports when a function is impure, rather than when it is pure (as in the purity checker). TODO: - improve the analysis to rely on impurity information of external library calls. Currently, all library calls are assumed to be nops, hence pure. - de-entangle Pulse reporting from analysis. Reviewed By: skcho Differential Revision: D17051567 fbshipit-source-id: 5e10afb4f	5 years ago

16 Commits (b29d1a2f5faa6dc69cd46a4340e28c70630ee21b)