* lazy tracing - define a [Trace.t], move global [fs] into it, and thread through code - add a parent-pointing tree/dag of printing thunks to [Trace.t] - use "event" and "history" terminology - change from immediately printing to creating a closure that prints when called, and add it to the dag - add [fork] and [join] operations on [Trace.t] - use [Trace.fork] in [Control.exec_term], and [Trace.join] in sync with [Domain.join] (in [Control.Work.run] or wherever) - add a form of "terminal" trace events, which prints all the ancestor events - change [Report] (and elsewhere?) to use Trace.terminal - support ex postfacto trace exploration + add a global list of terminals + add to terminals list instead of eagerly printing ancestors of terminals + dump/Marshal trace state at exit + add subcommand for querying dumped traces - list terminals - print ancestors of given terminal + support changing enabled status ex postfacto - record module and function names with printing thunks - when printing, recheck [enabled] - support incrementally writing trace data to file - support incrementally printing history as requested, in reverse - ? support more advanced queries * parallelize frontend - make a scan_types pass over all types to populate anon_struct_name, and change struct_name to only find, not add see http://llvm.org/doxygen/ValueEnumerator_8cpp_source.html#l00321 - [Trace.fork] a trace for each function - replace calls to fold_left_globals and fold_left_functions with calls to parmap - memo_type and memo_value could be put in shared memory instead + better sharing (as much as with sequential translation) + all their contents will live forever anyway + would need to handle concurrent accesses + maybe better to put entire Llair.t into shared memory + ? shared memory = reancient + locks * parallelize backend - change exec_* functions to instead of transforming the worklist, to return the new jobs (each job is an edge, depth(s?), and state) + also, change tracing so that they return new events rather than transform the whole event dag - adapt infer's ProcessPool + When a worker finishes its task, it writes to the "up" pipe, a message indicating that it is done, which includes the worker's id and a list of discovered jobs. Then it reads another task from its "down" pipe, which might block. Maybe it should do a slice of gc before reading. + The orc sits in a select waiting for the "up" pipe to be non-empty. Once it receives a message that a worker has finished, it reads responses from the "up" pipe, adding the jobs sent by the workers to the queue and add the now-idle workers to the back of the queue. When the "up" pipe is empty, it iterates through the idle workers, popping the next task from the queue and writing it to the worker's "down" pipe. Then the orc loops back to waiting on the "up" pipe. If the queue empties while there are still idle workers, keep the queue and add to it on the next finish message. Maybe the orc should check the "up" pipe between writes to worker "down" pipes. + Actually, repeatedly pop all the jobs for the same block from the queue, and send the list of states to the worker to join and execute from. + Currently in infer the operation of selecting the task to send to the child is trivial, but IIUC it does not have to be, and the list of tasks does not need to be computed beforehand. So, leaving the basic communication structure the same, it does not seem like a big change to extend the messages from worker to orc to also include a list of tasks to add to the queue, and to have the orc receive them, add them to a priority queue, pop the highest priority task from the queue and send it to the worker. Plus some check to see if there was an idle worker that could be given one of the tasks just returned to the orc. - initial inefficient version + communicate blocks - by forking workers after frontend finishes, thereby giving each worker a copy of the program - then passing block parent name/index and block index + but could instead, with some manual serialization code, pass blocks to/from workers over pipes - receiver must perform a lookup to find their local copy + communicate states using Marshal - likely to be slow - will proactively lose sharing of the representation + communicate trace events by forcing printing thunks to strings - optimize by storing program in shared memory (reancient?) + don't need to finish translation before starting analysis + pass block address in reancient heap instead of indices + receiver no longer needs to perform a lookup + saves memory, and time to copy it, and time to futilely GC it in all workers - optimize by communicating states without Marshal + could store them in a reancient heap and then communicate their index - probably fast, but leaky + could use a reancient heap for each worker, where it would store its jobs, until there is not enough space, at which point it would delete the heap and allocate a new one, passing the heap to the orc over the pipe - this would need make a deep copy of every entry, or else deleting the heap is unsafe since there could be sharing between entries + could perhaps have immortal heap of states appearing in function specs, try to keep sharing between communicated states and immortal ones, and take advantage of how Marshal won't follow pointers out of the GC heap to make communicated states small + really ought to have a global hash-cons structure which workers add states to in order to communicate them + check what flow/hack/zonc do see fbcode/hphp/hack/src/heap/hh_shared.c + store trace events in shared memory - to avoid forcing them eagerly - need a way to Marshal them from shared memory to write to file + perhaps serially at exit: copy to GC heap and Marshal as normal + perhaps incrementally copy oldest events from shared memory and Marshal to file * relax global topological ordering :PROPERTIES: :ID: 6D6A0AF5-F68F-4726-95E5-178145A4CB9B :END: - needed for lazy translation and bottom-up analysis - compute call graph (perhaps from ThinLTO info) - topsort call graph (callee smaller number than caller) + possible alternative might be to translate functions leaving their sort_index unset + then set it when first encountered during analysis + this relies on the assumption that the analysis will perform an appropriately ordered search + this assumption needs to be checked + this is probably only applicable for top-down analysis - add sort_index field to func like block - change to topsort blocks intraprocedurally - change priority queue to use lexicographically sorted pair of func and block indices, that is, (block.parent.sort_index, block.sort_index) - if intraprocedural top orders are insufficient + change use of block sort_index for priority in queue + instead of choosing a total order (represented by ints), represent the partial order itself + build a graph with blocks as vertices and edges for non-retreating jumps + then a < b iff there is a path from a to b + perhaps keep the graph transitively-closed, and then a < b iff b is a successor of a + extending such a graph can only add new ordering relationships, never change existing ones, the partial order is stable under extension, so translating code while analyzing will not break the queue + is Fheap compatible with a partial order, rather than a total order? + when adding just-translated code, need to add edges for all existing (non-retreating?) Call sites of added functions: will need to index them * lazy translation - need to [[id:6D6A0AF5-F68F-4726-95E5-178145A4CB9B][generalize to partial weak topological order]] to enable adding code during analysis without breaking the priority queue - translate function when analyzing a Call to a declared but untranslated function - if in ThinLTO mode, will need to worry about finding/loading bitcode: will need an index from function names to bitcode modules where they are defined (ThinLTO should have this info) * summarization - ? standard over-approximation, or something more in tune with refutation - ? procedures - ? code segments between function entry and call sites - common points: + summary includes - precondition - postcondition - depth for which summary is "sound" assuming every worklist item has higher depth + a summary for a given pre and depth may be incomplete (if there is an item in the worklist) + a summary for a pre and depth may be extended with another for the same pre and depth, by disjoining the posts * differential analysis * start-anywhere/bottom-up analysis * non-dnf solver * arithmetic constraints * overflow analysis - Currently expressions are shared between the IR for code and the analyzer's formulas. In order to strengthen the analyzer to detect overflow, it will likely be necessary to distinguish these, letting the analyzer normalize with polynomials, while keeping the same order of operations as the source for the IR. Plus another level of transformers for the IR expressions where their abstract values could be checked for being in range, etc.