need memory support already.

golems should be less magic.
currently at the logic level they are (IIUC) functions that are
applied to an input value rather than the state driving the function;
currently that function is spit out during I think smt translation.
makes more sense if the golem objects are just machine state and the
functions are applied in the symbolic execution or in the spec as
needed. then they can be readonly machine state and don't need so many
special cases.

should support readonly machine state, and should fix the symbolic
instruction generation to not try to touch readonly registers.

and when fixing the state gating should have it mark registers we
aren't allowed to modify as readonly so we don't materialize more than
one value each for them.

also somewhat relatedly the symbolic instruction generation should not
try to read registers with no live value in them.


the && and || operatiors in the logic should be defined to be shortcut
(because the logic includes expression that can fail, it needs to be
possible to shortcut away subexpressions that sometimes-fail) and this
needs to be reflected in the translation to SMT.


should add logic for issuing PIC that works like what I put in casp.


should pull in... not the casp use cases exactly but equivalent ones.


for add (and a few others) assert that rs < rt so we don't consider
e.g. both addu $4, $4, $5 and addu $4, $5, $4.
it isn't clear what the best way to express this really is, and nobody
ever did this in casp, so should set up to be able to try different
ones.


bug: it is important for the golem variables in the cache model to
always change the addressing function output (not just sometimes).
The addressing functions should xor in the golem state; I wrote the
casp model with and, but this is wrong. This has been fixed in the
cachelib material in goldfish and the casp examples. I'm pretty sure
the version coded into penguin has the same bug, and it needs to be
fixed too. This is not critical since the functions for mips actually
are different... but it's hazardous to have it wrong.

leftovers from the READSTATE changes (30f67237632e5babd7bd859dcf0358701781450e)
- should change the names of WHICHCACHE, WHICHCACHEENTRYID, WHICHCACHEENTRY,
WHICHCACHEVALUE, etc. These are way too long and while they're more or less
unambiguous they aren't especially great semantically. now that we're clear
on what they are and mean it should be possible to do better.
- review the things that match on the contents of WHICHCACHEENTRYVALUE
and READSTATE (which are now expressions) -- ideally this should not
ever be necessary. they are:
   - mk'readcacheentry in build.ml
   - mk'readstate in build.ml
   - mk'expr in build.ml


Producing logic.v engendered the following suggestions, not worth
following up on right now but maybe later if this code base remains
valuable:
   - having four different kinds of registers should get out of the
     logic expressions and into the state handle enumeration.
   - some of the other simplifications in logic.v's expression type
     should be pulled in.



- running sum4.pen starting with count 1 currently (immediately after
the .pen file changes) gives, when it gets to 3 instructions,
Instruction 0: Immediate out of range: 4294936702 > 32767
This is actually a valid value for a signed immediate (it's 0xffff887e)
but it's not being converted back from bitvector with the right sign
(I guess) and this obviously needs to be fixed.
That or the type constraints are wrong, but it's more likely
bitvector-related.
I've changed sum4.pen to start with count 3 for now so it runs...
(no longer happens and I think the underlying problem's been fixed)


- you can't currently bind additional variables in .pen files to use
as symbolic values or whatever in the spec. (You can let-bind but those
are limited to the scope of a single expression.)


- materialize whichcacheentry/whichcachegolem in build.ml as bool
instead of an int constrained to two values. Or at least consider
it... it may not be worth the trouble.

- nothing actually sets anything in enabled.ml so use_cache is wired
to true for now. the correct thing to do is to allow the input spec to
refer to any state and have that turn on the state it refers to in
enabled.ml. (right now the lists of things the input spec can refer to
are derived from enabled.ml, so if you don't set use_cache you get a
bizarre error if you run the cache test.)


- it ought to check that the precondition doesn't refer to the
post-state. (XXX: does this make sense any more? isn't the
precondition now defined as the assertions that don't refer to the
post-state?)

- it is also probably a good idea to check that the postcondition is
satisfiable. what this really means is:
     forall s0, exists s1, pre s0 -> post s0 s1
but we can't readily do that, so I guess settle for:
     an assignment of the start state and the end state where the
     precondition and postcondition are true
that is,
     exists s0 s1, pre s0 /\ post s0 s1


look into wheter an unsat core from verify can be used to build more
general start states/families of start states

try running quickcheck on the postcondition to get start states?

the spec parser doesn't seem to handle positions correctly (or maybe at
all) -- if you write a syntax error it always prints ":0:0: Parse error".

consider whether it makes sense to remove $0 from the general register
set entirely and instead create special instructions where it's an
implicit operand for cases where that makes sense. which is kinda
limited:
   - move is traditionally addu rd, rs, $0, but addiu works fine.
   - do need to be able to addiu constants to $0.
   - with non-addu add you can test for overflow and write the result
     to $0; but we don't support non-addu add.
   - SUBU rd, $0, rt is useful; can provide it as NEG. same for NOR/NOT.
   - various forms of SLT* and B* are useful, e.g. SLTU rd, $0, rt is
     C logical not. but we don't have the complex branches, so it's not
     that many forms.
   - storing $0 is maybe useful.
   - MT* with $0 might be useful, maybe.
   - loading and storing from offsets of $0 is maybe useful for putting
     kernel stuff on the last page of memory, but we don't actually do
     that, so meh.
this seems to suggest:
        SET rd, imm -> ADDIU rd, $0, imm
	NEG rd, rt -> SUBU rd, $0, rt
	NOT rd, rt -> NOR rd, $0, rt
	SLTZ rd, rt -> SLT rd, $0, rt
	SLTZU rd, rt -> SLTU rd, $0, rt
	BEQZ rd, imm -> BEQ rd, $0, imm
	BNEZ rd, imm -> BNE rd, $0, imm
	CZTC1 fcs -> CTC1 $0, fcs
	MZTC0 cs -> MTC0 $0, cs
	MZTC1 fs -> MTC0 $0, fs
	MZTHI -> MTHI $0
	MZTLO -> MTLO $0
	SZB mem -> SB $0, mem
	SZH mem -> SH $0, mem
	SZW mem -> SW $0, mem
and the move ones are pretty marginal.
so this seems like it might actually be worthwhile as it will get rid
of the special-casing of $0.

ought to move the number of startstates and counterexamples per cegis loop
settings to modes.ml.

should get a copy of the perturbation stuff eric put into casp (both
the solver seed control and prefixing identifiers with random letters
and anything else) for benchmarking

I am increasingly thinking it's feasible to do almost all the stuff we
do from an axiomatic machine description language instead of from
inlined open-coded axiomatic instruction specifications.

should experiment with inhibiting immediates, or allowing only
immediates related to constants in the problem. (on 20200503 eric ran
into a problem in casp where it got into a useless state where it was
doing a stupid guess with an immediate and each counterexample was
with a corresponding input value increased by one and it took 20
minutes to generate one instruction...)

should have a superoptimizer mode, which can afford to know more
instructions, or more instruction variants, or enable more registers,
or set noreorder. maybe always invoke it after generating an initial
program? note that linking the superoptimizer closely to synthesis
will allow it to produce outputs that meet the spec but aren't
necessarily exactly equivalent to the original version, e.g. differing
in the dead values left in scratch registers, which will probably let
it generate better code.

should also have a variant superoptimizer or something that runs after
everything else and prefers canonical encodings of common operations
(so it would e.g. replace xori v0, t0, t0 with li v0, 0, and xori v0,
t0, 0 with move v0, t0)... have to think about how to do this.

also if mucking with set noreorder it need to learn about hazards.

after adding memory support maybe add a simple set of deduction rules
using the sketch stuff, as in: if a value is in memory in the
prestate, and something in a register or in a different memory slot
depends on it in the poststate, add a load instruction to the sketch,
and similarly for store. if there are multiple such might want to not
nail down the order, so maybe generate a sketch that has N loads where
the loadees are a specific range of places and it's restricted so no
two are the same. (also make sure that a sketch that says "insn_a; _;
insn_b" handles the case where the "_" is empty; or at least that this
is expressible.)

also we really need to enforce the framing rules (both for modify and
access) -- need to think about whether we just allow all the control
registers to always be accessed or if we can do something more subtle
with the machine-dependent knowledge we're allowed to bake in.

would be neat if after everything we can generalize to riscv.

also should add support for mips versions and learn to generate code
that compiles for all, or something like that.

idea: try adding a constraint that says something like
(post.rd == pre.rs || post.rd == pre.rt) -> opcode == MOVE
probably won't speed things up much (though it might!) but
it will help to prevent bizarro outputs. 

if we need havoc, do it not by introducing additional SMT inputs (one
per runtime havoc occurrence) but havoc values in the symbolic
execution: any register that contains havoc should just not be read.
this has no smt-level footprint and is therefore faster and better...

on 20210311 I ripped out the support for modeling registers with
arrays (either equationally or via array updates) because it makes a
mess of the internals and the internals need to be reorganized. may
want to put it back later -- the idea was to use an smt-level array
for the register file, or in fact, one each for the five different
main register files things are organized into currently (general regs,
control regs, fp regs, fp control regs, golems) instead of individual
variables, then use either equations (like with individual variables)
to manage updates, or smt-level array updates. it was not faster;
array updates in particular were a lot slower. it might be nice to
have the ability back later in order to be able to rerun the
experiments when desired, but right now I need to reorg the internals
more. this means I ripped out not just the setting in modes.ml and
references to it, but also the READALLSTATE and UPDATEMAP cases of the
logic expr type and references to them.

