optimizations casp has that penguin doesn't and probably should:

- cutting down to smaller bitvectors for an initial round

- doing aggressive simplification of known equalities, and explicit
substitutions of things like counterexamples and models (even though
this is the solver's job, it seems that doing it up front is a lot
more effective)

- incrementally pushing material to the solver (and opening separate
solver contexts for guess and check to avoid having to ever pop it)

- using boolector for guessing

- sorting the choice lists of things (instructions, registers) since
apparently the solver favors the last ones and tries them first, which
is probably why it always seems to begin by trying xor. Crystal is
still working on figuring out what the best sort method is for casp.
(Update: it doesn't necessarily help much and can hurt)

- trying the last failed smaller-size guess as a prefix or subprogram
(this was crystal's idea and it works pretty well for casp)
note: this should be done by generating plans for the various things
to try and scoring them to choose what looks the most promising. the
scoring method is an open question.
("accumulation")
the ultimate conclusion for casp was that while accumulation helps a
lot occasionally, on average it's slower. I am not sure whether this
is because as implemented there it only does prefixes or because it
really isn't good or what. given that it has no right to work, it's
hard to reason about in the abstract, so tinkering will be
interesting.

- another way to mine for instructions that should be present in the
result, besides accumulation, is to try synthesizing parts of the
postcondition. not to mention just looking for operators in the
postcondition.

- for both of these we need a robust system for scheduling and testing
candidate sketches, which the current sketch support only sort of
hacked in.

- also should rip out the failed experiments

- handling framing properly and not destroying or considering
destroying random registers (this is also a correctness issue)

- also only reading registers mentioned in the precondition (or
control registers, or maybe something more specific about control
registers)

- also at each step each instruction should only read registers from
the precondition that have not been overwritten, or registers that
were written by a previous instruction.

- also the dependency stuff like in casp which is probably the single
best optimization in basic casp (see the casp paper, currently section
5.5)

- and the read/write sets, which is not the same as the dependency
analysis

- add templates for common operations that do not have native short
sequences (e.g. shift and mask)

- try asserting that if the output value is equal to any input value,
the opcode is MOVE (or LW, SW, etc.); or maybe on each instruction
that isn't MOVE (or LW, SW, etc.) assert that the output is not equal
to either input

- try abolishing the zero register and instead using templates/extra
instructions for things where the zero register would be used
explicitly

- for commutative operations like add allow only one order of the
input registers

- also, for commutative operations like add canonicalize the
precondition postcondion before starting real work

- do deductive stuff like eric's work, but this is a lot of work and a
big deal

- simple deductive rule we can probably do: if the postcondition has
multiple things with the same value, pick one as the leader, leave it
in the postcondition, and generate move ops for the others.


------------------------------------------------------------
also, things casp doesn't have yet:

- combining bits of state that move as a unit and aren't changed into
single bits at the solver level

- should go through the prestate (including memory when we have
memory) and test every element for being a pointer. for ones where we
can't tell from a simple analysis, ask the solver by asserting it is
and check-sat, then asserting it isn't and check-sat. if only one is
sat, then we know. if both are sat, that's an error. if neither is
sat, well...

- hints in the spec (the partial sketches that are now possible is one
kind of hint, but also stuff like hinting away registers that aren't
needed. although much of that is probably unnecessary when the state
gating is fixed.

- right now golem variables are special-cased, which is messy; but
they're readonly and that special case is worth handling. so they
should be un-special-cased and instead we should treat all readonly
registers with the same logic (that is, don't need a separate version
of the value for each program point) ... and note that after loading
the spec we can tag all registers readonly that aren't modified by the
postcondition.

- since we can eval over all the start states in the guess phase, we
can have constraints on instructions of the form "forall op1, P(op1, op2)",
which will let us express a way to avoid annoying things that appear
in casp output. e.g. andi should assert that (forall *rs, rs[i] == 0)
implies imm[i] == 0. (That is, we shouldn't randomly turn on bits in
the immediate that don't affect the output; this helps canonicalize
the immediates, which cuts the search space, reduces the number of
possible correct outputs to keep track of, and avoids incomprehensible
trash constants in the output code too.)
(Note that the example quantifies over for all possible values in the
operand register. Quantifying over all possible operand registers is
different and doesn't currently seem useful.)
unanswered question so far: what do we do with these during the check
phase? "nothing" is correct, but puts us in a position where there's
things that'll verify that won't guess. could go looking for bits we
aren't covering in the counterexamples, but that's probably not helpful.
