Skip to content

Varun-Chakraborty/compiler

Repository files navigation

Compiler - Rust Version

Rust MIT GitHub release (latest by date) Release

Compiler is a from-scratch CPU simulator paired with a simple assembler that can translate custom assembly language into binary code.

Archived Java Version

This project was originally started as a Java implementation to learn the basics of CPU simulation and assembly.
That version has now been archived and preserved in the java-archive tag.
The active development is now focused on the Rust port, due to its closer alignment with systems programming concepts.

Note: You’ll need Rust installed to run these Rust-based tools.

Overview

  • The CPU executes basic instructions like data movement, arithmetic, conditional jumps, input/output, and halting.
  • The assembler converts human-readable assembly into a .bin file, which the CPU can then run.

This project is being built to learn system software and understand how CPUs work at a low level.


Table of Contents

Quick Start

You have two options to run the tool:

Installation

You can download the latest pre-compiled executables for Windows, macOS, and Linux from the Latest Release page.

Just download the .zip archive for your operating system, unzip the file, and you're ready to go!

From Source

Example assembly codes are present in the repository in the examples folder.

  1. Clone the repository:

    git clone https://github.com/Varun-Chakraborty/compiler
    
  2. Navigate to the project directory:

    cd compiler
    
  3. Here you can optionally build the project in release mode if you want to run the binaries or can just use cargo run as described in point 4.

    cargo build --workspace --release --verbose
    
  4. Run the assembler:

    Using binary:

    ./target/release/assembler examples/fact.asm --bin
    

    Using cargo run:

    cargo run -p assembler examples/fact.asm
    
  5. Run the CPU:

    Using binary:

    ./target/release/cpu output.bin
    

    Using cargo run:

    cargo run -p cpu output.bin
    

Components

ISA

Symbol table mapping for opcodes:

Opcode Mnemonic Expected Count of Arguments Description
0 HALT 0 Halts the CPU.
1 MOVER 2 (R, M) Moves data from memory to a register.
2 MOVERI 2 (R, V) Moves a constant (immediate value) to a register.
3 MOVEM 2 (R, M) Moves data from a register to memory.
4 MOVEMI 2 (M, V) Moves a constant (immediate value) to memory.
5 IN 1 (R) Reads data from the user.
6 OUT 1 (R) Writes data to the user.
7 ADD 3 (R, R, M) Adds register and memory and stores the result in a register specified in the first operand.
8 ADDI 3 (R, R, V) Adds a register and a constant (immediate value) and stores the result in a register specified in the first operand.
10 SUB 3 (R, R, M) Subtracts memory from a register and stores the result in a register specified in the first operand.
11 SUBI 3 (R, R, V) Subtracts a constant (immediate value) from a register and stores the result in a register specified in the first operand.
12 MULT 3 (R, R, M) Multiples register and memory and stores the result in a register specified in the first operand.
13 MULTI 3 (R, R, V) Multiples a register and a constant (immediate value) and stores the result in a register specified in the first operand.
14 DIV 3 (R, R, M) Divides memory from a register and stores the result in a register specified in the first operand.
15 DIVI 3 (R, R, V) Divides a constant (immediate value) from a register and stores the result in a register specified in the first operand.
16 MOD 3 (R, R, M) Divides memory from a register and stores the remainder in a register specified in the first operand.
17 MODI 3 (R, R, V) Divides a constant (immediate value) from a register and stores the remainder in a register specified in the first operand.
18 JMP 1 (M) Jump to a memory address.
19 JZ 1 (M) Jump to a memory address if the zero flag is set.
20 JNZ 1 (M) Jump to a memory address if the zero flag is not set.
21 AND 3 (R, R, M) ANDs register and memory and stores the result in a register specified in the first operand.
22 OR 3 (R, R, M) ORs register and memory and stores the result in a register specified in the first operand.
23 XOR 3 (R, R, M) XORs register and memory and stores the result in a register specified in the first operand.
24 NOT 1 (R) NOTs a register and stores the result in the same register.
25 SHL 1 (R) Shifts a register left by 1 bit and stores the result in the same register.
26 SHR 1 (R) Shifts a register right by 1 bit and stores the result in the same register.
27 ROL 1 (R) Rotates a register left by 1 bit and stores the result in the same register.
28 ROR 1 (R) Rotates a register right by 1 bit and stores the result in the same register.
29 CMP 2 (R, R) Compares two registers and sets flags.
30 CMPI 2 (R, V) Compares a register with a constant (immediate value) and sets flags.
31 PUSH 1 (R) Pushes a register onto the stack.
32 POP 1 (R) Pops a register from the stack.
33 CALL 1 (M) Pushes the program counter onto the stack and jumps to a memory address.
34 RET 0 Pops the program counter from the stack and jumps to it.

  • NOTE: Some instructions that accept 3 operands can also be written with 2. The assembler automatically expands them.

Operands

  • R: Register
  • M: Memory Address [Data Memory or Program Memory (as per the context)]
  • V: Constant

For more details, refer to the isa crate

CPU

  • Executes a custom instruction set.

  • Supports various opcodes as defined in the ISA.

  • Keeps track of:

    • Registers (R0, R1, R2, R3)
    • Data memory
    • Program memory
    • Program counter (PC)
  • Supports several flags:

    • Basic Instruction: Minimal arguments mandatorily required.
      cargo run -p cpu output.bin
      
    • Debug mode: Prints detailed execution steps.
      cargo run -p cpu output.bin --debug
      
    • Log: Writes execution log to a file/console.
      cargo run -p cpu output.bin --log=file
      

Assembler

(One pass assembler)

  • Converts .asm source files into binary.

  • Basic Instruction format:
    [label:] <4-bit opcode> [<2-bit register> <4-bit operand> [<4-bit operand3>] [<8-bit program memory address (in case of labels)>]]

    • Here, [] are optional and <> are required parts of the instruction.
  • Uses Symbol Table to resolve labels.

  • Uses Table of Incomplete Instructions to resolve forward references.

  • Operand format:

    • Opcode: 4 bits (0-15)
    • Register: 2 bits (R0 = 00, R1 = 01, R2 = 10, R3 = 11)
    • Data Memory Address: 4 bits (0-15)
    • Program Memory Address: 8 bits (0-255)
  • Supports several flags:

    • Basic Instruction: Minimal arguments mandatorily required.
      cargo run -p assembler examples/fact.asm
      
    • Debug mode: Outputs detailed assembly-to-binary conversion steps and creates a debug.txt file containing ASCII representation of the binary.
      cargo run -p assembler examples/fact.asm --debug
      
    • Pretty Debug mode: ASCII representation of the binary in debug.txt is prettified.
      cargo run -p assembler examples/fact.asm --debug --pretty
      
      NOTE: pretty flag should be accompanied by debug flag else it will be ignored.
    • Log: Writes assembly log to a file/console.
      cargo run -p assembler examples/fact.asm --log=file
      

How It Works

c81c3311-c1da-4d1e-92e3-f5261516a11b
  1. Write Assembly

    Example: ADD R0, R1, 0

    • This means: add the value at memory location 0 with value at register R1 and store the result in register R0.

    An example assembly code and is present in this repository as index.asm.

  2. Assemble

    Run the assembler to convert your .asm file into a .bin file:

    cargo run -p assembler examples/fact.asm

    This produces raw binary in output.bin.

    Note:

    1. The assembler also generates a .txt file with ASCII 0 and 1 bits if run in debug mode.
      cargo run -p assembler examples/fact.asm --debug --pretty
      This produces a human-readable binary alongside the raw binary in debug.txt.
    2. A python script is present in the root of the repository to verify if the raw binary matches the ASCII representation (generated in debug mode). You can run it as:
      python3 convertBinToASCIIBin.py output.bin
      This will print the ASCII representation of the binary in the console.
  3. Run on CPU

    Pass output.bin to the CPU simulator:

    cargo run -p cpu output.bin
    

    The CPU will:

    • Load the program into instruction memory.
    • Fetch, decode, and execute each instruction.
    • Print output as per the instructions, asking for input or displaying the value of a register.

Example

program.asm

      IN R0               ; Input the number
      MOVEM R0, 1         ; Move input to memory location 1
      MOVER R1, 1         ; Move input value at memory location 1 to R1
      DC 1, 1             ; Constants; declare a constant of value 1 at memory location 1
      MOVER R0, 1         ; Move value at memory location 1 i.e. 1 to R0
LOOP: MOVEM, R1, 0        ; Support of labels; Move input to memory location 0
      MULT R0, 0          ; Multiply value at R0 (default 1 for the first iteration) with input
      SUB R1, 1           ; Subtract 1 (at memory location 1) from input
      JNZ LOOP            ; Jump to loop if input is not 0
      OUT R0              ; Output the result
      HALT                ; END of program

As you might have guessed, the above program calculates the factorial of the input number.

Output (Normal Mode)

Loading binary file: output.bin
Binary file loaded successfully.
Starting execution...
Enter value for register 0: 5
Output from register 0: 120
End of Execution.

You can use the --debug flag as defined in the CPU section to run the CPU in debug mode


Verification

The python script convertBinToASCIIBin.py can be used to verify the binary output by converting it to ASCII 0 and 1 bits. Run it as follows:

python3 convertBinToASCIIBin.py output.bin

This will print the ASCII representation of the binary to the console, which can be compared with the expected output.

This step is optional and mainly for debugging or cross-checking the assembler’s output.

Current Limitations

  • Input/Output is basic (manual IN and OUT instructions).

Future Improvements

  • Create a REPL for live assembly and execution.
  • Support for more registers and larger memory space.
  • Support for floating point operations and more instructions.

Motivation

"Feels good to write 0s and 1s and see them do something."

This project is a practical step toward learning system software by building a CPU from scratch, understanding the fetch-decode-execute cycle, and bridging theory with a working implementation.


License

The project is released under the MIT License.


Contributing

Contributions are welcome! Please fork the repository and create a pull request.