Summary:
This avoids wasting potentially large amount of work in some
pathological situations. Suppose `foo()` has D specs/disjuncts in its
Pulse summary, and consider a node that calls `foo()` N times, starting
with just one disjunct:
```
[x]
foo()
[x1, ..., xD]
foo()
[y1, ..., yD^2]
foo()
...
```
At the end we get `D^N` disjuncts. Except, currently (before this diff),
we prune the number of disjuncts to the limit L at each step, so really
what happens is that we (very) quickly reach the L limit, then perform
`L*D` work at each step (where we take "apply one pre/post pair to the
current state" to be one unit of work), thus doing `O(L*D*n)` amount of
work.
Instead, this diff counts how many disjuncts we get after each
instruction is executed, and we already reched the limit L then doesn't
bother accumulating more disjuncts (as they'd get discarded any way),
and crucially also doesn't bother executing the current instruction on
these extra disjuncts. This means we only do `O(L*n)` work now, which is
exactly how we expect pulse to scale: execute each instruction (in
loop-free code) at most L times.
This helps with new arithmetic domains that are more expensive and
exhibit huge slowdowns without this diff but no slowdown with this diff
(at least on a few examples).
Reviewed By: skcho
Differential Revision: D22815241
fbshipit-source-id: ce9928e7c