dippy (adjective): silly and eccentric or scatterbrained
The idea for Dippy originated on hackaday.io but now this is the main page.
A really wacky computer built using DIP switches for ROM, a few shift registers, no RAM, running a Forth-like instruction set and having a minimal transistor count.
Q: Why DIP switches for ROM? A: So that you can visibly see all of the instructions. There will be one LED per bank of DIP switches so that you can see which one is being read and the instruction bus will also have LEDs on it so that you can see that the values on the switch have made it to the processor.
Q: Why shift registers? A: To keep the component count down - registers/RAM and ALU are most of the count for a PISC 16 bit processor. Ideally the shift registers would be physical, needing components only to read or write, more on this below.
Q: Why no RAM? A: The idea is that the shift registers have enough storage to run really basic programs. External RAM violates the completeness of the solution and it wouldn't be visible.
Q: Why Forth-like instruction set? Because it is compact, I've got to squeeeze in as much as possible here.
Q: Why minimal transistor count? A: Partly because of the challange, partly because it is easier for others to replicate and build on this work but mostly because I'm terrible at soldering so unless it's minimial it will never get finished.
10 bit instructions in a 9 bit read-only address space built from 512 10-way dip switches (e.g. eBay).
16 shift registers, holding at least 16 bits each, hopefully 32 bits. These form an ascending Forth data stack and a descending Forth address stack (even though addresses are 10 bits). Two 4 bit registers act as the data and address stack pointers. If not physical shift registers but four components per memory cell (may be 5), then 16 registers at 16 bits would be 1024 components (512 transistors). At 32 bits per register that would be 2048 components (1024 transistors).
The goal is a reasonably inexpensive program storage where it is clear how it operates.
A simple (linear) design would take the 9 bit address bus and invert each line to give 18 signals. Any and all addresses can now be decoded using 9 diodes and one transistor per address. Each diode is wired to either the corresponding data line or its inverse, so that current flows in all cases except the required address. This then feeds a transistor whose output is high when the required address is present. A LED can be used as the load so that there is a visual indication of the address used. The output feeds all the top pins of the DIP switches via diodes, with all lower pins connected to the instruction bus. There's a huge number of DIP switches and an enoumous number of diodes…
An improved design would first get rid of all of the diodes to the instruction bus, then work on better decoding.
I'd really like the registers to be 32 bits as, unusually, with shift registers the component count is independent of the register size, i.e. the register bits come for free (okay, that's assuming no acoustic dissipation and even then the instruction clock does run proportionally slower)
I'd also like to see all the bits. This is difficult with most hardware shift registers. One solution might be to take the input and power a scanning laser, the persistance of vision may show all the bits.
Also ideally it would have a variable clock rate between one cycle every few seconds (so that you can see exactly how everything works) up to hundreds of kiloHertz (the max switching speed of the transistors).
b10 | b9 | b8-b6 | b5-b1 |
---|---|---|---|
0 | 0 | JSR to 9 bit address ending in 0 | |
0 | 1 | LOAD - 8 bit immediate load | |
1 | 0 | condition | branch relative: -16 to +15 |
1 | 1 | condition | basic instruction (5 bit) |
If top two bits are clear, then JSR to the remaining address (even addresses only). Thus this is subroutine linked Forth but without the overhead of the JSR instruction.
If next bit clear, load immediate the lower 8 bits (possibly sign extended - TBD)
Everything else is 3 bit conditional (1, lt0, le0, eq0, ne0, ge0, gt0, 0). Half of the space is for relative branching, of 5 bits (-16 to +15). The remaining is the basic instruction space:
If can keep the basic instructions to 16 then I can rejig the instruction space so that JSR doesn't have to end in zero. But if I use DIP switches to decode the instructions then it would be nice to allow others to add instructions just by setting these switches. Here is full 9 bit JSR addressing:
b10 | b9 | b8-b6 | b5 | b4-b1 |
---|---|---|---|---|
0 | JSR to full 9 bit address | |||
1 | 0 | 8 bit immediate load | ||
1 | 1 | condition | 0 | branch relative -8 to +7 |
1 | 1 | condition | 1 | basic instruction (4 bit) |
I find a C like notation very convenient:
INSTRUCTION | Register movement |
---|---|
JSR | *R– = P ; P = I |
RET | P = *R++ |
LOAD(X) | *D++ = X |
B(OFFSET) | IF condition THEN P += OFFSET |
I've not yet written an emulator, or even fixed the instruction set, so none of this is final. Nevertheless, it's useful to write some code to see what is missing.
LOAD(0) :loop INC DUP OUT B(:loop)
Learning: DUP is a very common instruction and it may well be worth having a DUP-OUT as well as a DUP instruction. On the other hand, DUP OUT RET is only 3 or 4 words. Is 4x slower and 4x the memory worth it? Maybe it depends on what the microcode decode looks like and how much instruction space there is. AND/OR/XOR/ADD/SUB/D2R are all candicaes for an extra DUP or two (e.g. DUP2ADD which is a non-destructive ADD).
IN IN ADD OUT
Learning: Input has to be buffered, that is the processor should stop if input is not yet available. Perhaps input is done with a 0/1 toggle switch and an add to buffer. Once it's full then the processor can continue. Another add-to-buffer switch which adds 8 copies may well be useful as a 32 bit input is probably all 1s or all 0s in the top bits.
It would be nice to have more than one register as output. Maybe an RPi will feed the input and store all output?
There is only a few registers and no carry bit, so this is just 32bit by 32bit giving a 32bit result. With no LSB its hard to peel off the low bits and stop when the result is zero, which is a shame as most invocations won't be full width.
Simple first pass with LSR - return stack stores accumulator, works best with last arg +ve (can test and switch):
def MUL LOAD(0) D2R # set accumulator to zero :loop DUP LOAD(1) AND BZ(:skip) # test low bit and skip hard work if not set DUPDUPADD R2D ADD D2R :skip D2R DUP ADD R2D # double the first arguement LSR # halve the second argument BNZ(:loop) # loop if not zero DROP DROP # get rid of both arguments R2D # retrive result RET # and exit happy
Where:
def DUPDUPADD OVER OVER ADD RET # this would be much better with a non-consuming ADD def OVER SWAP DUP D2R SWAP R2D RET
Learning: Really need both LSR and non-destructive ALU operations. What is a good naming convention for ALU operations that implicitly encodes the data stack changes?