Compiler is a from-scratch CPU simulator paired with a simple assembler that can translate custom assembly language into binary code.
This project was originally started as a Java implementation to learn the basics of CPU simulation and assembly.
That version has now been archived and preserved in the java-archive tag.
The active development is now focused on the Rust port, due to its closer alignment with systems programming concepts.
Note: You’ll need Rust installed to run these Rust-based tools.
- The CPU executes basic instructions like data movement, arithmetic, conditional jumps, input/output, and halting.
- The assembler converts human-readable assembly into a
.binfile, which the CPU can then run.
This project is being built to learn system software and understand how CPUs work at a low level.
- Quick Start
- Components
- How It Works
- Examples
- Verification
- Current Limitations
- Future Improvements
- Motivation
You have two options to run the tool:
You can download the latest pre-compiled executables for Windows, macOS, and Linux from the Latest Release page.
Just download the .zip archive for your operating system, unzip the file, and you're ready to go!
Example assembly codes are present in the repository in the examples folder.
-
Clone the repository:
git clone https://github.com/Varun-Chakraborty/compiler -
Navigate to the project directory:
cd compiler -
Here you can optionally build the project in release mode if you want to run the binaries or can just use cargo run as described in point 4.
cargo build --workspace --release --verbose -
Run the assembler:
Using binary:
./target/release/assembler examples/fact.asm --binUsing cargo run:
cargo run -p assembler examples/fact.asm -
Run the CPU:
Using binary:
./target/release/cpu output.binUsing cargo run:
cargo run -p cpu output.bin
Symbol table mapping for opcodes:
| Opcode | Mnemonic | Expected Count of Arguments | Description |
|---|---|---|---|
| 0 | HALT | 0 | Halts the CPU. |
| 1 | MOVER | 2 (R, M) | Moves data from memory to a register. |
| 2 | MOVERI | 2 (R, V) | Moves a constant (immediate value) to a register. |
| 3 | MOVEM | 2 (R, M) | Moves data from a register to memory. |
| 4 | MOVEMI | 2 (M, V) | Moves a constant (immediate value) to memory. |
| 5 | IN | 1 (R) | Reads data from the user. |
| 6 | OUT | 1 (R) | Writes data to the user. |
| 7 | ADD | 3 (R, R, M) | Adds register and memory and stores the result in a register specified in the first operand. |
| 8 | ADDI | 3 (R, R, V) | Adds a register and a constant (immediate value) and stores the result in a register specified in the first operand. |
| 10 | SUB | 3 (R, R, M) | Subtracts memory from a register and stores the result in a register specified in the first operand. |
| 11 | SUBI | 3 (R, R, V) | Subtracts a constant (immediate value) from a register and stores the result in a register specified in the first operand. |
| 12 | MULT | 3 (R, R, M) | Multiples register and memory and stores the result in a register specified in the first operand. |
| 13 | MULTI | 3 (R, R, V) | Multiples a register and a constant (immediate value) and stores the result in a register specified in the first operand. |
| 14 | DIV | 3 (R, R, M) | Divides memory from a register and stores the result in a register specified in the first operand. |
| 15 | DIVI | 3 (R, R, V) | Divides a constant (immediate value) from a register and stores the result in a register specified in the first operand. |
| 16 | MOD | 3 (R, R, M) | Divides memory from a register and stores the remainder in a register specified in the first operand. |
| 17 | MODI | 3 (R, R, V) | Divides a constant (immediate value) from a register and stores the remainder in a register specified in the first operand. |
| 18 | JMP | 1 (M) | Jump to a memory address. |
| 19 | JZ | 1 (M) | Jump to a memory address if the zero flag is set. |
| 20 | JNZ | 1 (M) | Jump to a memory address if the zero flag is not set. |
| 21 | AND | 3 (R, R, M) | ANDs register and memory and stores the result in a register specified in the first operand. |
| 22 | OR | 3 (R, R, M) | ORs register and memory and stores the result in a register specified in the first operand. |
| 23 | XOR | 3 (R, R, M) | XORs register and memory and stores the result in a register specified in the first operand. |
| 24 | NOT | 1 (R) | NOTs a register and stores the result in the same register. |
| 25 | SHL | 1 (R) | Shifts a register left by 1 bit and stores the result in the same register. |
| 26 | SHR | 1 (R) | Shifts a register right by 1 bit and stores the result in the same register. |
| 27 | ROL | 1 (R) | Rotates a register left by 1 bit and stores the result in the same register. |
| 28 | ROR | 1 (R) | Rotates a register right by 1 bit and stores the result in the same register. |
| 29 | CMP | 2 (R, R) | Compares two registers and sets flags. |
| 30 | CMPI | 2 (R, V) | Compares a register with a constant (immediate value) and sets flags. |
| 31 | PUSH | 1 (R) | Pushes a register onto the stack. |
| 32 | POP | 1 (R) | Pops a register from the stack. |
| 33 | CALL | 1 (M) | Pushes the program counter onto the stack and jumps to a memory address. |
| 34 | RET | 0 | Pops the program counter from the stack and jumps to it. |
- NOTE: Some instructions that accept 3 operands can also be written with 2. The assembler automatically expands them.
Operands
- R: Register
- M: Memory Address [Data Memory or Program Memory (as per the context)]
- V: Constant
For more details, refer to the isa crate
-
Executes a custom instruction set.
-
Supports various opcodes as defined in the ISA.
-
Keeps track of:
- Registers (R0, R1, R2, R3)
- Data memory
- Program memory
- Program counter (PC)
-
Supports several flags:
- Basic Instruction: Minimal arguments mandatorily required.
cargo run -p cpu output.bin - Debug mode: Prints detailed execution steps.
cargo run -p cpu output.bin --debug - Log: Writes execution log to a file/console.
cargo run -p cpu output.bin --log=file
- Basic Instruction: Minimal arguments mandatorily required.
(One pass assembler)
-
Converts
.asmsource files into binary. -
Basic Instruction format:
[label:] <4-bit opcode> [<2-bit register> <4-bit operand> [<4-bit operand3>] [<8-bit program memory address (in case of labels)>]]- Here, [] are optional and <> are required parts of the instruction.
-
Uses Symbol Table to resolve labels.
-
Uses Table of Incomplete Instructions to resolve forward references.
-
Operand format:
- Opcode: 4 bits (0-15)
- Register: 2 bits (R0 = 00, R1 = 01, R2 = 10, R3 = 11)
- Data Memory Address: 4 bits (0-15)
- Program Memory Address: 8 bits (0-255)
-
Supports several flags:
- Basic Instruction: Minimal arguments mandatorily required.
cargo run -p assembler examples/fact.asm - Debug mode: Outputs detailed assembly-to-binary conversion steps and creates a debug.txt file containing ASCII representation of the binary.
cargo run -p assembler examples/fact.asm --debug - Pretty Debug mode: ASCII representation of the binary in debug.txt is prettified.
NOTE: pretty flag should be accompanied by debug flag else it will be ignored.
cargo run -p assembler examples/fact.asm --debug --pretty - Log: Writes assembly log to a file/console.
cargo run -p assembler examples/fact.asm --log=file
- Basic Instruction: Minimal arguments mandatorily required.
-
Write Assembly
Example:
ADD R0, R1, 0- This means: add the value at memory location
0with value at registerR1and store the result in registerR0.
An example assembly code and is present in this repository as
index.asm. - This means: add the value at memory location
-
Assemble
Run the assembler to convert your
.asmfile into a.binfile:cargo run -p assembler examples/fact.asm
This produces raw binary in output.bin.
Note:
- The assembler also generates a
.txtfile with ASCII0and1bits if run in debug mode.This produces a human-readable binary alongside the raw binary in debug.txt.cargo run -p assembler examples/fact.asm --debug --pretty
- A python script is present in the root of the repository to verify if the raw binary matches the ASCII representation (generated in debug mode).
You can run it as:
This will print the ASCII representation of the binary in the console.
python3 convertBinToASCIIBin.py output.bin
- The assembler also generates a
-
Run on CPU
Pass output.bin to the CPU simulator:
cargo run -p cpu output.binThe CPU will:
- Load the program into instruction memory.
- Fetch, decode, and execute each instruction.
- Print output as per the instructions, asking for input or displaying the value of a register.
program.asm
IN R0 ; Input the number
MOVEM R0, 1 ; Move input to memory location 1
MOVER R1, 1 ; Move input value at memory location 1 to R1
DC 1, 1 ; Constants; declare a constant of value 1 at memory location 1
MOVER R0, 1 ; Move value at memory location 1 i.e. 1 to R0
LOOP: MOVEM, R1, 0 ; Support of labels; Move input to memory location 0
MULT R0, 0 ; Multiply value at R0 (default 1 for the first iteration) with input
SUB R1, 1 ; Subtract 1 (at memory location 1) from input
JNZ LOOP ; Jump to loop if input is not 0
OUT R0 ; Output the result
HALT ; END of program
As you might have guessed, the above program calculates the factorial of the input number.
Output (Normal Mode)
Loading binary file: output.bin
Binary file loaded successfully.
Starting execution...
Enter value for register 0: 5
Output from register 0: 120
End of Execution.
You can use the --debug flag as defined in the CPU section to run the CPU in debug mode
The python script convertBinToASCIIBin.py can be used to verify the binary output by converting it to ASCII 0 and 1 bits.
Run it as follows:
python3 convertBinToASCIIBin.py output.binThis will print the ASCII representation of the binary to the console, which can be compared with the expected output.
This step is optional and mainly for debugging or cross-checking the assembler’s output.
- Input/Output is basic (manual IN and OUT instructions).
- Create a REPL for live assembly and execution.
- Support for more registers and larger memory space.
- Support for floating point operations and more instructions.
"Feels good to write 0s and 1s and see them do something."
This project is a practical step toward learning system software by building a CPU from scratch, understanding the fetch-decode-execute cycle, and bridging theory with a working implementation.
The project is released under the MIT License.
Contributions are welcome! Please fork the repository and create a pull request.