Skip to content

A simple imperative language. Features both a compiler with optional static analysis and optimization steps as well as an interpreter.

Notifications You must be signed in to change notification settings

Nargaruga/mini_imp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MiniImp

Usage

The project uses the Dune build system. To build and run tests, execute the following:

  • build with dune build;

  • run tests with dune test (requires OUnit2).

Interpreter

The mimpi executable for the MiniImp interpreter can be run as

dune exec -- mimpi <input_file> <input_value>

where input_file is the path to a file containing MiniImp code and input_value is the value to be loaded in the input variable.

Compiler

The mimpc executable for the Mini-IMP compiler can be run as

dune exec -- mimpc <input_file> [flags]

where input_file is the path to a file containing Mini-IMP code. To see the available flags, run as dune exec -- mimpc -help.

Project Structure

The bin directory contains modules for the interpreter and compiler executables, the lib directory contains the bulk of the implementation and the test directory contains OUnit2-based tests. The following is a short summary of the (main) modules making up the lib directory.

scanner.mll

Ocamllex input file for generating the lexer.

parser.mly

Menhir input file for generating the parser.

ast.ml

MiniImp abstract syntax tree definition.

env.ml

"String $\to$ value" environment definition.

input.ml

Methods for parsing MiniImp programs from files or plain strings.

interpreter.ml

Functions for evaluating MiniImp programs.

cfg.ml

Generic control flow graph definition.

mini_imp_cfg.ml

MiniImp control flow graph definition.

mini_risc_cfg.ml

MiniRISC control flow graph definition.

mini_risc.ml

MiniRISC definition and functions for translating MiniImp into MiniRISC.

defined_analysis.ml

Functions for checking that all variables are defined before use.

liveness_analysis.ml

Functions for generating the live ranges of registers.

register_allocation.ml

Functions for mapping virtual registers to physical registers.

register_merging.ml

Functions for merging redundant registers.

code_generation.ml

Functions for outputting MiniRISC to file.

Parsing

The parser definition resides in the lib/parser.mly file, which is fed into Menhir to generate the actual parser. Shift-reduce conflicts were mostly reduced by splitting the grammar in multiple levels and just the following precedence annotation was used:

%left SEMICOLON (* makes sequence left associative: a;b;c;d -> ((a;b);c);d *)

Conditionals and loops can be nested and concatenated without ambiguity, as both conditional branches are mandatory and both constructs are clearly closed by a DONE token. All binary operations are left-associative and the usual precedence of arithmetic expressions is respected. The not operator is right-associative and has a higher priority than and.

The grammar is split in the following nonterminal symbols:

%type <Ast.prog option> prog
%type <Ast.expr> expr
%type <Ast.expr> term
%type <Ast.expr> factor
%type <Ast.stmt> stmt

Interpretation

MiniImp programs are run by sequentially evaluating statements in a given environment, which is initialized with the value of the input variable. The interpreter performs basic type-checking while evaluating expressions, such as making sure that guards are booleans and that binary operations are applied to appropriate operands.

Compilation

The compiler for MiniImp generates instructions for a fictitious language, MiniRISC, developed for a likewise hypothetical register-based virtual machine. The defined variable analysis and register merging phases can be disabled by a command-line argument, although skipping the former will prevent the compiler from detecting the usage of undefined variables. An overview of the steps is given:

  1. Parse input file to generate the AST;

  2. build the (minimal-block) MiniImp CFG from the AST;

  3. transform the MiniImp CFG into a MiniRISC CFG by translating MiniImp statements into sequences of MiniRISC instructions;

  4. perform liveness analysis on the MiniRISC CFG to detect live ranges;

  5. (optional) perform defined variable analysis to make sure that no program using a variable before its definition is allowed to run;

  6. (optional) merge virtual registers that have non-overlapping live-ranges;

  7. perform global register allocation to map virtual registers to physical registers, adding spill code where necessary;

  8. generate the final code by translating CFG arcs to jump instructions and by outputting (labeled) MiniRISC instructions to a file.

Control Flow Graph

The cfg.ml module contains a functor, Make, parameterized with the Node module. The mini_imp_cfg.ml and mini_risc_cfg.ml use it to implement the CFG for MiniImp and MiniRISC, respectively. Sequences are dealt with by first generating all blocks for the first statement, then all blocks for the second statement, finally connecting the last block of the first statement with the first block of the second statement.

A CFG is organized as a hash table that maps IDs to values of type

type node = { contents : T.contents; data : T.data; next : jump }

where contents determine the contents of the block corresponding to the node, data is additional information to be used in later stages and jump defines the outgoing arcs of the node.

MiniImp and MiniRISC have differently organized nodes:

  • MiniImp:

    • contents: a single statement (nodes represent minimal blocks, both for ease of implementation and to help the defined-variable analysis);

    • data: def-in and def-out sets for defined variable analysis and an integer value, only used in conditional nodes, containing the ID of the dummy block placed after the conditional. This value is used in the code generation phase to generate instructions in the correct order (more details in the code generation section).

  • MiniRISC:

    • contents: a sequence of instructions implementing the corresponding MiniImp statement;

    • data: live_in and live_out sets for liveness analysis as well as the same integer value as above.

For both of them, jump is a variant defined as

type jump =
    | Target of T.id
    | Branch of T.guard * T.id * T.id
    | Halt

Defined Variable Analysis

This analysis is performed on the MiniImp CFG by recursively recomputing def_in and def_out sets until a fixpoint is reached. The analysis is simplified by the fact that the MiniImp CFG is generated as a minimal-block CFG: each block only contains a single statement, meaning we don't have to consider the fact that a variable used in a statement might be defined in another statement in the same block.

Liveness Analysis

Like the prevoius analysis, liveness analysis is performed by recursively recomputing sets (live_in and live_out) associated with each block until a fixpoint is reached. This time, the analysis is performed on the MiniRISC CFG, as the live ranges of registers are needed for later phases.

Register Merging

Virtual registers whose live ranges do not overlap can be safely merged. The algorithm to do so keeps track, for each register, of the set of CFG nodes for which the register itself is "live in". Any two registers for which the intersection of the respective sets is empty are merged into one.

Register Allocation

To map virtual registers to the k physical registers available on the target machine, we perform global register allocation using greedy graph coloring. First, we build the interference graph by using the live ranges of the virtual registers, then we perform greedy graph coloring to find a (k-2)-coloring for the graph (where the $-2$ accounts for the presence of registers dedicated to spills). Since nodes correspond to virtual registers and colors to physical ones, this coloring gives us the mapping we need.

Once the interference graph is built, we rewrite the code to reflect the allocation: we iterate through all instructions in a block, one block at a time, substituting registers and adding spill code in the form of LoadI, Load and Store operations. The memory is implemented as a hash table that maps each virtual register to an integer representing the address where its values must be spilled.

There are two registers dedicated to spilled values: TmpRegA and TmpRegB. These are used to store intermediate results and are not designed to carry values from one block to the next.

Code Generation

Code generation is the final step of the compiler and consists of turning the MiniRISC CFG into a sequence of labeled instructions. CFG block IDs are used as labels and arcs are translated as jump instructions, the resulting list of instructions is then formatted and printed to an output file. Note that the only jump instructions generated at this stage are the plain jumps that occur at the end of a loop body or of a conditional branch: sequential blocks are merged with no jump while conditional jumps corresponding to branching nodes are generated when building the MiniRISC CFG, since those have a guard that needs to go through the register allocation phase.

The ID stored in the data field during the CFG generation phase tells us which node is the immediate successor to the conditional branches. By generating code for the join node first, the code generation procedure has a clear stop condition when generating code for the branches, since the join node has already been marked as visited. This gives us three lists of instructions (one for the then branch, one for the else branch, and one for the rest of the program, which are then appended in the correct order). This was done to solve a problem where a recursive call of the code generation procedure on one of the branches would continue until the end of the program, as it would not spot the end of the branch.

About

A simple imperative language. Features both a compiler with optional static analysis and optimization steps as well as an interpreter.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published