8086-rs is a Rust-based toolchain for analyzing and interpreting binaries, compiled for the Intel 16-bit 8086-type family, made with the intention of interpreting binaries compiled for MINIX 1.x.
Features:
- A parser for the
a.outformat, to parse legacy MINIX 1.x executables - A disassembler to parse the 16-bit instructions into an IR
- Disassembly output in a
objdump(1)-style fashion - Interpretation of instructions
- MINIX 1.x interrupts and memory layout
- Obeying of segment register indirection (
CS,SS,DS,ES) - Full 20-bit memory bus
To compile and run the tool, use Cargo:
cargo build --release
Or run it directly:
cargo run -- --help
Run with output:
RUST_LOG=debug cargo run -- interpret -p ./a.out 2>&1 | less
info will show things, such as register state and call to interrupts, debug will additionally show disassmbly and interpretation internals.
CLI Options:
$ cargo run -- --help
Simple program to disassemble and interpret 8086 a.out compilates, e.g. such for MINIX
Usage: i8086-rs [OPTIONS] [ARGV]... <COMMAND>
Commands:
disassemble Disassemble the binary into 8086 instructions [aliases: d]
interpret Interpret the 8086 instructions [aliases: i]
help Print this message or the help of the given subcommand(s)
Arguments:
[ARGV]... argv passed to the program, which will be interpreted
Options:
-p, --path <PATH> Path of the binary
-d, --dump Dump progress of disassembly, in case of encountering an error
-h, --help Print help
-V, --version Print version
$ cat 1.c
main() {
write(1, "hello\n", 6);
}
$ ./target/release/i8086-rs interpret -p ./a.out
hello
$ RUST_LOG=info ./target/release/i8086-rs interpret -p ./a.out
INFO: Initializing stack...
INFO: Initializing static data...
INFO: (0000) xor %bp, %bp 0000 0000 0000 0000 ffb4 0000 0000 0000 ---------
INFO: (0002) mov %bx, %sp 0000 0000 0000 0000 ffb4 0000 0000 0000 -----Z---
INFO: (0004) mov %ax, [%bx] 0000 ffb4 0000 0000 ffb4 0000 0000 0000 -----Z---
...
This project is under active development and primarily used by me to explore some Intel disassembly and learn some more Rust. Expect bugs and some missing features. I mainly test with 'official' binaries from the MINIX source tree.
Currently, everything is in the binary, but I want to move some parts to a lib, which would make it much easier to ignore the Minix 1.x specifics (e.g. currently with a hardcoded interrupt handler) and would allow for more generic usage of this 8086 (e.g. implenting an own simple BIOS or OS). But first I want to implement all features correctly and add tests for all of them, before I want to move to that.
Code is currently not fetched from memory, but from a seperate vector, stored inside the Disassembler struct, which fetches and parses the next instruction from the instruction pointer.
Although, the CS:IP addressing scheme is still being used, to allow for 20-bit access, but does currently now allow for self-modifying code.
Also the disassmbler just uses an initial sweep for disassembly, which has a high probability of not being accurate, when compared to the runtime. E.g. maybe there is a jump to a memory address during interpretation, which was not identified as an instruction by the disassembler.
The documentation of the project itself can be accessed by using cargo doc.
$ cargo doc
$ firefox target/doc/i8086_rs/index.html
For the implementation of the disassembly, I used the Intel "8086 16-BIT HMOS MICROPROCESSOR" Spec, as well as this overview of all Opcode variants used in conjunction with this decoding matrix.
For the implementation of the interpreter, I used the Intel "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z" Spec.
- Map instructions into actual memory for interpretation
- Implement all Minix Interrupts
- Allow execution of 'raw' instructions, not only
a.out - Don't hardcode Minix
- Implement BIOS Interrupts
For once, this project stemmed from a university exercise about the 8086 instruction set and disassembly. An interpreter for these assembly instructions was the logical (?) next step. Maybe I add raw 8086 emulation some day.
There is no real reason, I just wanted to try to implement most parts myself, even if it meant more boilerplate code.
I used nom extensivly in the past and I just wanted to see what it would be like without that crate.
In hindsight, using nom would have been the cleaner option, but hey, something I only learned by not using nom for once.