Welcome to the comprehensive documentation for the BCC (Basic Compiler Collection) interpreter! This documentation covers our memory-efficient Lox-like interpreter implementation, designed to be educational, thorough, and accessible to developers at all levels.
A comprehensive, function-by-function overview of the lexical analysis phase. Learn how raw source code is transformed into meaningful tokens.
What you'll learn:
- How character-by-character scanning works
- Token recognition patterns and algorithms
- Error handling and position tracking
- Design decisions and trade-offs
- Unicode handling and keyword recognition
An in-depth exploration of how tokens are structured into Abstract Syntax Trees using recursive descent parsing.
What you'll learn:
- Grammar rules and precedence hierarchies
- Recursive descent parsing techniques
- AST construction and error recovery
- Statement vs expression parsing
- Operator precedence and associativity
βοΈ The Evaluator: A Deep Dive
A complete guide to tree-walking interpretation and how programs are executed.
What you'll learn:
- Tree-walking interpretation algorithms
- Environment chains and lexical scoping
- Value system and type operations
- Control flow execution
- Error handling during runtime
ποΈ Architecture Overview
A high-level understanding of the BCC interpreter's design philosophy, component interactions, and overall system architecture.
What you'll learn:
- System design principles and philosophy
- Memory-efficient parsing using string slices
- Component relationships and data flow
- Module organization and dependencies
- Error handling strategy
- Performance characteristics and design trade-offs
A practical guide for extending the interpreter with native functions implemented in Rust.
What you'll learn:
- How to implement built-in functions
- Type checking and argument validation
- Integration with the evaluator
- Best practices and common patterns
- Testing and debugging built-ins
A gentle, beginner-friendly introduction to compiler and interpreter concepts using BCC as a practical example.
What you'll learn:
- Fundamental compiler/interpreter concepts
- The compilation pipeline and phases
- Different implementation approaches
- Common algorithms and patterns
- How to build your own language
If you're new to BCC or compiler implementation:
- Start with: Compilers for Dummies for fundamental concepts
- Then read: Architecture Overview for the big picture
- Dive deep with: The component-specific deep dives (Lexer, Parser, Evaluator)
- Extend and experiment: Creating Built-in Functions
BCC implements a Lox-like dynamically typed scripting language with:
- Python-like variable assignment:
x = 42(novarkeyword) - Optional semicolons: Write code with or without them
- C-style control flow:
if,while,forstatements - Block scoping: Variables scoped to
{}blocks - Dynamic typing: Types determined at runtime
- Integers:
42,-17(64-bit signed) - Doubles:
3.14,-0.5(IEEE 754 floating point) - Strings:
"hello","world"(UTF-8) - Booleans:
true,false - Nil:
nil(absence of value)
- Arithmetic:
+,-,*,/with type promotion - Comparison:
<,<=,>,>=,==,!= - Logical:
and,or,!with short-circuit evaluation - String concatenation:
"hello" + " world"
- Conditionals:
if (condition) statement else statement - Loops:
while (condition) statement - For loops:
for (init; condition; increment) statement - Block statements:
{ statement1; statement2; }
BCC uses a memory-efficient implementation with these characteristics:
- String Slices: Uses
&strinstead of ownedStrings for identifiers and keywords - Reduced Allocations: Significantly fewer heap allocations during parsing
- Better Cache Performance: Improved memory locality for better cache utilization
- Scalability: Memory benefits increase with program size and complexity
Aspect Traditional BCC Implementation
Memory Allocations Many Minimal
Memory Usage Standard Reduced (~40-70% less)
Parse Complexity O(n) O(n) (same algorithmic complexity)
Cache Performance Good Better (due to locality)
- Memory: Significant reduction in allocations and peak usage
- Code Complexity: Moderate increase due to lifetime management
- Speed: Similar algorithmic performance with better cache characteristics
- Maintenance: Clean, well-documented codebase prioritizing clarity
// Variables and arithmetic
x = 10
y = 20
result = x + y * 2
print result // Prints: 50// Conditionals and loops
count = 0
while (count < 5) {
if (count % 2 == 0) {
print "Even: " + str(count)
} else {
print "Odd: " + str(count)
}
count = count + 1
}// Block scoping demonstration
x = "global"
{
x = "outer"
{
x = "inner"
print x // Prints: inner
}
print x // Prints: outer
}
print x // Prints: global- Clean, readable code: Every component is designed for understanding
- Comprehensive documentation: Detailed explanations of design decisions
- Simple algorithms: Straightforward implementations over clever optimizations
- Minimal dependencies: Only essential external crates (ariadne, clap)
- Ariadne integration: Beautiful colored error reports
- Precise source locations: Every token and AST node tracks its position
- Context-aware messages: Different error types provide appropriate guidance
- Helpful suggestions: Error messages guide users toward solutions
- Clear separation: Each phase in its own module
- Clean interfaces: Well-defined boundaries between components
- Extension points: Easy to add new features and functionality
- Test-friendly: Components can be tested in isolation
- Interactive development: Immediate feedback for expressions
- Persistent state: Variables maintain values between commands
- Smart output: Shows expression values but not assignment results
- Error recovery: Continue after errors in interactive mode
bcc/
βββ src/
β βββ main.rs # CLI interface and entry point
β βββ lexer.rs # Tokenization (characters β tokens)
β βββ parser.rs # Parsing (tokens β AST)
β βββ evaluator.rs # Interpretation (AST β execution)
β βββ ast.rs # AST node definitions
β βββ value.rs # Value type system
β βββ error.rs # Error types and reporting
β βββ runner.rs # File execution orchestration
β βββ repl.rs # Interactive shell
βββ docs/ # This documentation
βββ Cargo.toml # Project configuration
βββ test.bcc # Example program
BCC is designed to be educational and extensible. Common extensions include:
- Functions: User-defined functions with closures
- Arrays: Dynamic lists and indexing
- Objects: Hash tables or prototype-based objects
- Modules: Import/export systems
- Error handling: Try/catch mechanisms
- Bytecode compilation: Compile AST to intermediate representation
- Virtual machine: Execute bytecode instead of tree-walking
- Optimization passes: Constant folding, dead code elimination
- Just-in-time compilation: Compile hot code to native instructions
- Language server: IDE integration with hover, completion, errors
- Debugger: Breakpoints, variable inspection, call stacks
- Formatter: Automatic code formatting
- Linter: Style checking and best practice enforcement
- Read Compilers for Dummies
- Run BCC and experiment with the REPL
- Read Architecture Overview
- Trace through a simple example in the code
- Study the Lexer Deep Dive
- Implement a simple lexer for a mini-language
- Study the Parser Deep Dive
- Try extending BCC's grammar with new syntax
- Study the Evaluator Deep Dive
- Follow Creating Built-in Functions
- Implement new language features
- Consider alternative implementation strategies
- "Crafting Interpreters" by Bob Nystrom - Excellent companion to this project
- "Compilers: Principles, Techniques, and Tools" (Dragon Book) - Comprehensive theory
- "Language Implementation Patterns" by Terence Parr - Practical techniques
- ANTLR: Parser generator with great documentation
- LLVM Tutorial: Building a JIT compiler
- Rust Book: For understanding the implementation language
- Lox (from Crafting Interpreters) - Direct inspiration for BCC
- Monkey (from "Writing an Interpreter in Go") - Similar educational focus
- Wren - Simple, practical scripting language
If you have questions while studying the documentation or code:
- Check the documentation: Each component has extensive explanation
- Read the code comments: Key algorithms are explained inline
- Run examples: The best way to understand is to experiment
- Study the tests: Tests show expected behavior and edge cases
Remember: BCC is designed for learning. Don't hesitate to modify, experiment, and break things - that's how you learn!
BCC demonstrates that educational software doesn't have to be toy software. With careful design, a teaching-focused implementation can be both understandable and practical.
The techniques demonstrated in BCC - lexical analysis, recursive descent parsing, tree-walking interpretation, and environment-based scoping - are fundamental to language implementation. Master these concepts here, and you'll be prepared to understand and contribute to much more complex systems.
Happy learning, and enjoy exploring the fascinating world of programming language implementation!