The VAX was a nice, clean 32-bit architecture that was remarkably stable across its life from 1977 to the late 90's.  The machine code was compact, it was easy to assemble and disassemble (even by hand), and it felt very natural to people who were used to the PDP-11.  The VAX and the PDP-11 could largely share peripherals, across more than a decade.

It was in many ways like the x86 we still work with today: it had a long live, compatibility was very important, the machines became ever faster, etc.

VAX instruction format

VAX instructions are very simple.  They are variable-length (from 1 to 50+ bytes!) and very byte-oriented by nature.  Parsing/execution is naturally sequential and lends itself very nicely to interpretation by a byte-oriented microcode program.

The first one or two bytes is the opcode.  All two-byte opcodes begin with FDā‚•.  This means there are 512 different opcodes available, of which slightly more than 300 were ever in allocated -- not counting the vector instructions which were only implemented on two micro architectures and never used much.

Each opcode has a specific set of operands that follow -- there is literally a table with 512 entries where each entry describes which and what kind of operands.  ADDB3 ("add byte, 3 operands") always takes two byte operands for reading and one byte operand for writing, for example.  Whether an operand value comes from a register, is an immediate value, or whether it comes from memory, is something that is described locally in the operand bytes.

Each operand is encoded as at least one byte, the operand descriptor.  The high nibble contains the addressing mode (6-bit immediate, register, register deferred, autodecrement, autoincrement, autoincrement deferred, byte displacement, longword displacement, etc.).  The low nibble contains either part of a 6-bit immediate value or a register number.  If the high nibble is 4 it is an indexed addressing mode and another byte follows that describes the addressing mode in more detail.  After that, there might be an immediate value ranging in size from 1 byte to 16 (for really big integer/floating-point values).

You don't know where the first operand starts (or even if there is one!) until you have decoded the opcode.  You don't know how many immediate bytes are in the operand until you have decoded the operand descriptor.  You don't know if it is a two-byte descriptor until you have decoded the first byte.  You don't know where the second operand starts until you have decoded the first one, etc.

All in all, it is very clean and simple but the variable length, the potentially very long instructions, and the serial dependency makes it harder to decode instructions fast and in parallel.  Thankfully, most instructions have only a few operands using only simple addressing modes.  Complicated instructions are actually very rare in practice.


I wrote a VAX emulator.  It is simpler and smaller than the three existing open-source VAX emulators (SimH, ts10, evax).

It is the only one to look like a modern(ish) CPU: instead of writing the whole thing in C, it has a small and generic part in C and the rest is driven by tables and "microcode".  Both the instructions and the operands are handled by "microcode", as are exceptions and interrupts.


My VAX assembler isn't completed yet.  It is a traditional line-based assembler which used a "transactional combinator parsing" technique to parse the operands.  It is probably my nicest hand-written parser code yet (and I've been writing parsers for a long time).


The disassembler is a great help when reverse engineering code.  Not only can it be controlled in detail by a text file, it is also capable of tracing through a binary and follow jumps, branches, and calls to figure out what is code and what is not.

It was inspired by my experiences with Sourcer by V Communications, a disassembler for DOS that I encountered in the early 90's.