Skip to content

An a.out parser, 8086 disassembler and interpreter with Minix 1.x interrupt support, written in Rust

Notifications You must be signed in to change notification settings

marcothms/8086-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

8086-rs

8086-rs is a Rust-based toolchain for analyzing and interpreting binaries, compiled for the Intel 16-bit 8086-type family, made with the intention of interpreting binaries compiled for MINIX 1.x.

Features:

  • A parser for the a.out format, to parse legacy MINIX 1.x executables
  • A disassembler to parse the 16-bit instructions into an IR
  • Disassembly output in a objdump(1)-style fashion
  • Interpretation of instructions
  • MINIX 1.x interrupts and memory layout
  • Obeying of segment register indirection (CS, SS, DS, ES)
  • Full 20-bit memory bus

Usage

To compile and run the tool, use Cargo:

cargo build --release

Or run it directly:

cargo run -- --help

Run with output:

RUST_LOG=debug cargo run -- interpret -p ./a.out 2>&1 | less

info will show things, such as register state and call to interrupts, debug will additionally show disassmbly and interpretation internals.

CLI Options:

$ cargo run -- --help
Simple program to disassemble and interpret 8086 a.out compilates, e.g. such for MINIX

Usage: i8086-rs [OPTIONS] [ARGV]... <COMMAND>

Commands:
  disassemble  Disassemble the binary into 8086 instructions [aliases: d]
  interpret    Interpret the 8086 instructions [aliases: i]
  help         Print this message or the help of the given subcommand(s)

Arguments:
  [ARGV]...  argv passed to the program, which will be interpreted

Options:
  -p, --path <PATH>  Path of the binary
  -d, --dump         Dump progress of disassembly, in case of encountering an error
  -h, --help         Print help
  -V, --version      Print version

Example

$ cat 1.c
main() {
    write(1, "hello\n", 6);
}

$ ./target/release/i8086-rs interpret -p ./a.out
hello                                                                 

$ RUST_LOG=info ./target/release/i8086-rs interpret -p ./a.out
INFO: Initializing stack...                                                                    
INFO: Initializing static data...                                                              
INFO: (0000) xor %bp, %bp                     0000 0000 0000 0000 ffb4 0000 0000 0000 ---------
INFO: (0002) mov %bx, %sp                     0000 0000 0000 0000 ffb4 0000 0000 0000 -----Z---
INFO: (0004) mov %ax, [%bx]                   0000 ffb4 0000 0000 ffb4 0000 0000 0000 -----Z---
...

Status

This project is under active development and primarily used by me to explore some Intel disassembly and learn some more Rust. Expect bugs and some missing features. I mainly test with 'official' binaries from the MINIX source tree.

Currently, everything is in the binary, but I want to move some parts to a lib, which would make it much easier to ignore the Minix 1.x specifics (e.g. currently with a hardcoded interrupt handler) and would allow for more generic usage of this 8086 (e.g. implenting an own simple BIOS or OS). But first I want to implement all features correctly and add tests for all of them, before I want to move to that.

Caveats

Code is currently not fetched from memory, but from a seperate vector, stored inside the Disassembler struct, which fetches and parses the next instruction from the instruction pointer. Although, the CS:IP addressing scheme is still being used, to allow for 20-bit access, but does currently now allow for self-modifying code.

Also the disassmbler just uses an initial sweep for disassembly, which has a high probability of not being accurate, when compared to the runtime. E.g. maybe there is a jump to a memory address during interpretation, which was not identified as an instruction by the disassembler.

Documentation

The documentation of the project itself can be accessed by using cargo doc.

$ cargo doc
$ firefox target/doc/i8086_rs/index.html 

For the implementation of the disassembly, I used the Intel "8086 16-BIT HMOS MICROPROCESSOR" Spec, as well as this overview of all Opcode variants used in conjunction with this decoding matrix.

For the implementation of the interpreter, I used the Intel "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z" Spec.

TODOs

  • Map instructions into actual memory for interpretation
  • Implement all Minix Interrupts
  • Allow execution of 'raw' instructions, not only a.out
  • Don't hardcode Minix
  • Implement BIOS Interrupts

FAQ

Why hassle with interpretation and not just emulate 8086?

For once, this project stemmed from a university exercise about the 8086 instruction set and disassembly. An interpreter for these assembly instructions was the logical (?) next step. Maybe I add raw 8086 emulation some day.

Why no nom?

There is no real reason, I just wanted to try to implement most parts myself, even if it meant more boilerplate code. I used nom extensivly in the past and I just wanted to see what it would be like without that crate. In hindsight, using nom would have been the cleaner option, but hey, something I only learned by not using nom for once.

About

An a.out parser, 8086 disassembler and interpreter with Minix 1.x interrupt support, written in Rust

Topics

Resources

Stars

Watchers

Forks