A minimal compiled programming language that demonstrates how compilers work. Perfect for learning about language implementation and compilation.
A compiler is like a translator that converts your code into something your computer can understand. Here's the process:
-
Lexer (Word Processor)
- Reads your code character by character
- Groups characters into meaningful tokens (like words in a sentence)
- Handles whitespace and tracks line/column information
- Example:
x = 42becomes[IDENTIFIER "x", ASSIGN "=", NUMBER "42"]
-
Parser (Grammar Checker)
- Implements a recursive descent parser
- Checks if your code follows the language's rules
- Builds a tree structure (AST) showing how operations relate
- Handles operator precedence (* before +)
- Example:
x = 42becomes:AssignmentStatement ├── identifier: "x" └── value: NumberLiteral(42)
-
Code Generator (Translator)
- Traverses the AST and generates x86-64 assembly
- Manages variable allocation on the stack
- Example:
x = 42becomes:mov rax, 42 mov QWORD PTR [rbp-8], rax
-
Assembler & Linker (Builder)
- Assembler (NASM): Converts assembly to machine code
- Creates binary object files (like a puzzle with missing pieces)
- Keeps track of where things are (symbol tables)
- Notes where things need to be connected (relocation info)
- Linker (ld): Creates the final executable
- Organizes the program's memory:
.text: Where the code lives.data: Where variables start with values.bss: Where variables start as zero
- Connects everything together (like completing a puzzle)
- Creates the program's startup instructions
- Makes sure the operating system can run it
- Organizes the program's memory:
- Assembler (NASM): Converts assembly to machine code
-
Direct Machine Code (Modern)
- Compilers like Rust, Go, and JavaScript generate machine code directly
- Benefits: Faster execution, better optimization
- Example: Rust's compiler creates highly optimized code in one step
-
Assembly Generation (Traditional)
- This project uses this approach for clarity
- Benefits: Easier to understand and debug
- Example: Early C compilers worked this way
- Numbers:
42 - Variables:
x = 42 - Math:
x + y * 3(with proper precedence) - Print:
print x - While loops:
while x > 0 { ... }
-
Install:
git clone https://github.com/mattcookio/emojilang.git cd emojilang npm install -
Write code in
example.asm:x = 42 print x
-
Compile and run:
npm run compile example.asm ./example
emojilang/
├── src/ # Compiler source
│ ├── lexer/ # Token creation
│ ├── parser/ # Parsing logic
│ ├── ast/ # Abstract Syntax Tree
│ └── codegen/ # Assembly output
├── examples/ # Sample programs
└── emojilang-vscode/ # VS Code extension with syntax highlighting
The project includes a dedicated VS Code extension (emojilang-vscode) that provides:
- Syntax highlighting for all emoji operators:
- Assignment: 🍌
- Addition: 🤪
- Subtraction: 🥴
- Multiplication: 🤯
- Print: 🦄
- While loop: 🔄
- Loop end: 🛑
- Statement terminator: 💩
- Number and identifier highlighting
- Comment support
- Basic language features
To install the extension:
- Open VS Code
- Go to Extensions (Ctrl+Shift+X or Cmd+Shift+X)
- Search for "emojilang"
- Click Install
The compiler could be extended with:
- More data types (strings, booleans, arrays)
- Control flow (if/else, loops)
- Functions and scope
- Type checking
- Optimization passes
MIT License - see LICENSE