seatedro · seatedro · Dec 31, 2025 · Dec 29, 2025 · Dec 13, 2025 · Dec 25, 2025
diff --git a/.envrc b/.envrc
@@ -0,0 +1 @@
+use flake
diff --git a/.gitignore b/.gitignore
@@ -1 +1,5 @@
 /target
+.direnv
+
+test/
+.todo.md
diff --git a/AGENTS.md b/AGENTS.md
@@ -1,209 +1,178 @@
-# AGENTS.md - AI Agent Guide for Glimpse
+# Glimpse Development Guide
 
-## Project Overview
+A blazingly fast tool for peeking at codebases. Perfect for loading your codebase into an LLM's context.
 
-Glimpse is a fast Rust CLI tool for extracting codebase content into LLM-friendly formats. It's designed to help users prepare source code for loading into Large Language Models with built-in token counting, tree visualization, and multiple output formats.
+## Task Tracking
 
-**Key capabilities:**
-- Fast parallel file processing using Rayon
-- Directory tree visualization
-- Source code content extraction
-- Token counting (tiktoken/HuggingFace backends)
-- Git repository cloning and processing
-- Web page scraping with Markdown conversion
-- Interactive TUI file picker
-- XML and PDF output formats
-- Per-repository configuration via `.glimpse` files
+Check `.todo.md` for current tasks and next steps. Keep it updated:
+- Mark items `[x]` when completed
+- Add new tasks as they're discovered
+- Reference it before asking "what's next?"
 
-## Codebase Structure
+## Commits
 
-```
-glimpse/
-├── src/
-│   ├── main.rs           # Entry point, CLI arg handling, routing
-│   ├── cli.rs            # CLI argument definitions using clap
-│   ├── config.rs         # Global and repo-level configuration
-│   ├── analyzer.rs       # Core file processing logic
-│   ├── source_detection.rs # Source file detection (extensions, shebangs)
-│   ├── output.rs         # Output formatting (tree, files, XML, PDF)
-│   ├── tokenizer.rs      # Token counting backends
-│   ├── git_processor.rs  # Git repository cloning
-│   ├── url_processor.rs  # Web page fetching and HTML→Markdown
-│   └── file_picker.rs    # Interactive TUI file selector
-├── build.rs              # Build script that generates languages.rs from languages.yml
-├── languages.yml         # Language definitions (extensions, filenames, interpreters)
-├── Cargo.toml            # Dependencies and package metadata
-├── .github/workflows/
-│   ├── test.yml          # CI: tests, clippy, formatting
-│   └── release.yml       # CD: multi-platform builds, publishing
-└── test_project/         # Test fixtures
+Use `jj` for version control. Always commit after completing a phase:
+
+```bash
+jj commit -m "feat: add glimpse-code crate scaffolding"
 ```
 
-## Development Environment
+Use conventional commit prefixes:
+- `feat` - new feature
+- `fix` - bug fix
+- `refactor` - restructure without behavior change
+- `chore` - maintenance, dependencies, config
+- `docs` - documentation only
+- `test` - adding or updating tests
 
-Always use the devshell from the flake for all commands:
+## Build Commands
 
 ```bash
-nix develop
+cargo build                    # debug build
+cargo build --release          # release build
+cargo run -- <args>            # run with arguments
+cargo run -- .                 # analyze current directory
+cargo run -- --help            # show help
 ```
 
-## Development Commands
+## Test Commands
 
 ```bash
-# Build and run
-cargo build
-cargo run -- [OPTIONS] [PATH]
+cargo test                              # run all tests
+cargo test test_name                    # run single test by name
+cargo test test_name -- --nocapture     # run test with stdout
+cargo test -- --test-threads=1         # run tests sequentially
+```
 
-# Run tests
-cargo test
+## Lint & Format
 
-# Check code quality (required to pass CI)
-cargo clippy -- -D warnings
-cargo fmt -- --check
+```bash
+cargo fmt                      # format all code
+cargo fmt -- --check           # check formatting (CI)
+cargo clippy                   # run linter
+cargo clippy -- -D warnings    # fail on warnings (CI)
+```
 
-# Format code
-cargo fmt
+## Project Structure
 
-# Build release
-cargo build --release
+```
+glimpse/
+├── src/
+│   ├── main.rs        # binary entry point
+│   ├── lib.rs         # library root
+│   ├── cli.rs         # CLI arg parsing
+│   ├── analyzer.rs    # directory processing
+│   ├── output.rs      # output formatting
+│   ├── core/          # config, tokenizer, types, source detection
+│   ├── fetch/         # git clone, url/html processing
+│   ├── tui/           # file picker
+│   └── code/          # code analysis (extract, graph, index, resolve)
+├── tests/             # integration tests
+├── languages.yml      # language definitions for source detection
+├── registry.toml      # tree-sitter grammar registry
+└── build.rs           # generates language data from languages.yml
 ```
 
-## Key Architecture Decisions
+## Code Style
 
-### Source File Detection
-Detection happens in `source_detection.rs` via `is_source_file()`:
-1. Check known filenames (Makefile, Dockerfile, etc.)
-2. Check file extensions against `SOURCE_EXTENSIONS`
-3. Fall back to shebang parsing for scripts
+### No Comments
 
-Extension/filename data is code-generated at build time from `languages.yml` via `build.rs`.
+Code should be self-documenting. The only acceptable documentation is:
+- Brief `///` docstrings on public API functions that aren't obvious
+- `//!` module-level docs when necessary
 
-### Include/Exclude Pattern Behavior
-- `--include` (or `-i`): **Additive** - patterns are added to default source detection
-- `--only-include`: **Replacement** - only specified patterns are used, ignoring source detection
-- `--exclude` (or `-e`): Applied after inclusion, works with both modes
+```rust
+// BAD: explaining what code does
+// Check if the file is a source file
+if is_source_file(path) { ... }
 
-### Token Counting
-Two backends available in `tokenizer.rs`:
-- `TokenizerType::Tiktoken` (default) - Uses `tiktoken-rs` for OpenAI-compatible counting
-- `TokenizerType::HuggingFace` - Uses `tokenizers` crate for HuggingFace models
+// BAD: inline comments
+let name = path.file_name(); // get the filename
 
-### Configuration Hierarchy
-1. Global config: `~/.config/glimpse/config.toml` (Linux/macOS) or `%APPDATA%\glimpse\config.toml` (Windows)
-2. Repo config: `.glimpse` file in project root
-3. CLI arguments (highest priority)
+// GOOD: self-documenting code, no comments needed
+if is_source_file(path) { ... }
 
-### Output Formats
-- Default: Copies to clipboard
-- `-p/--print`: Outputs to stdout
-- `-f/--file [PATH]`: Writes to file (default: `GLIMPSE.md`)
-- `-x/--xml`: Wraps output in XML tags for better LLM parsing
-- `--pdf PATH`: Generates PDF output
+// GOOD: docstring for non-obvious public function
+/// Extract interpreter from shebang line and exec pattern
+fn extract_interpreter(data: &str) -> Option<String> { ... }
+```
 
-## Testing Conventions
+### Import Order
 
-- Unit tests are co-located with source code in `#[cfg(test)]` modules
-- Integration tests use `tempfile` for isolated filesystem testing
-- Tests should handle network-dependent operations gracefully (see `git_processor.rs` tests)
-- Mock servers used for URL processing tests via `mockito`
+Group imports in this order, separated by blank lines:
+1. `std` library
+2. External crates (alphabetical)
+3. Internal crates - prefer `super::` over `crate::` when possible
 
-Example test pattern:
 ```rust
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use tempfile::tempdir;
-
-    #[test]
-    fn test_feature() -> Result<()> {
-        let dir = tempdir()?;
-        // Test logic using temp directory
-        Ok(())
-    }
-}
-```
+use std::fs;
+use std::path::{Path, PathBuf};
 
-## CI/CD Pipeline
+use anyhow::Result;
+use serde::{Deserialize, Serialize};
 
-### Test Workflow (`.github/workflows/test.yml`)
-Runs on push/PR to `master`:
-- `cargo test --verbose`
-- `cargo clippy -- -D warnings`
-- `cargo fmt -- --check`
+use super::types::FileEntry;      // preferred for sibling modules
+use crate::config::Config;        // only when super:: won't reach
+```
 
-### Release Workflow (`.github/workflows/release.yml`)
-Triggered by version tags (`v*`):
-1. Creates GitHub release
-2. Builds binaries for: `x86_64-unknown-linux-gnu`, `aarch64-apple-darwin`, `x86_64-pc-windows-msvc`
-3. Uploads release assets
-4. Updates Homebrew tap
-5. Publishes to crates.io
+### Error Handling
 
-## Code Style Guidelines
+- Use `anyhow::Result` for fallible functions
+- Propagate errors with `?` operator
+- Use `.expect("message")` only when failure is a bug
+- Never use `.unwrap()` outside of tests
+- Use `anyhow::bail!` for early returns with errors
 
-- Write terse, self-commenting code
-- Comments only on docstrings for functions
-- Follow standard Rust formatting (`cargo fmt`)
-- Use `anyhow::Result` for error handling in application code
-- Prefer `?` operator over explicit `match` for error propagation
-- Use `#[derive]` macros for common traits
+### Naming Conventions
 
-## Version Control
+- `snake_case` for functions, methods, variables, modules
+- `PascalCase` for types, traits, enums
+- `SCREAMING_SNAKE_CASE` for constants
+- Prefer descriptive names over abbreviations
+- Boolean functions: `is_`, `has_`, `can_`, `should_`
 
-Use jujutsu (`jj`) instead of git for all version control operations.
+### Type Definitions
 
-```bash
-jj status
-jj diff
-jj new -m "message"
-jj describe -m "message"
-jj bookmark set <name>
-jj git push
-```
+- Derive common traits: `Debug`, `Clone`, `Serialize`, `Deserialize`
+- Put derives in consistent order
+- Use `pub` sparingly - only what's needed
 
-## Common Patterns
-
-### Adding a new CLI option
-1. Add field to `Cli` struct in `cli.rs` with appropriate `#[arg(...)]` attributes
-2. Handle the option in `main.rs` routing logic
-3. Update `RepoConfig` in `config.rs` if it should be saveable
-
-### Adding file type support
-Edit `languages.yml` to add extensions, filenames, or interpreters. The build script will regenerate detection code automatically.
+```rust
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct FileEntry {
+    pub path: PathBuf,
+    pub content: String,
+    pub size: u64,
+}
+```
 
-### Modifying output format
-Edit `output.rs`:
-- `generate_tree()` for tree structure
-- `generate_files()` for file contents
-- `generate_output()` orchestrates the full output
+### Function Style
 
-## Important Dependencies
+- Keep functions focused and small
+- Use early returns for guard clauses
+- Prefer iterators and combinators over loops when clearer
+- Use `impl Trait` for return types when appropriate
 
-| Crate | Purpose |
-|-------|---------|
-| `clap` | CLI argument parsing with derive macros |
-| `rayon` | Parallel file processing |
-| `ignore` | .gitignore-aware file walking |
-| `tiktoken-rs` | OpenAI tokenizer |
-| `tokenizers` | HuggingFace tokenizer |
-| `git2` | Git repository operations |
-| `scraper` | HTML parsing for web processing |
-| `ratatui` | Terminal UI for file picker |
-| `arboard` | Clipboard access |
-| `printpdf` | PDF generation |
+### Testing
 
-## Debugging Tips
+- Tests live in `#[cfg(test)] mod tests` at bottom of file
+- Use descriptive test names: `test_<what>_<condition>`
+- Use `tempfile` for filesystem tests
+- Group related assertions
 
-- Use `--print` to see output directly instead of clipboard
-- Use `--no-tokens` to skip tokenizer initialization during debugging
-- For file selection issues, check `.gitignore` patterns with `--no-ignore`
-- For hidden file issues, use `-H/--hidden`
+### Patterns to Follow
 
-## Version Bumping
+- Use `Option` combinators: `.map()`, `.and_then()`, `.unwrap_or()`
+- Use `Result` combinators: `.map_err()`, `.context()`
+- Prefer `&str` over `String` in function parameters
+- Use `impl AsRef<Path>` for path parameters when flexible
+- Use builders for complex configuration
 
-Version is defined in `Cargo.toml`. When releasing:
-1. Update version in `Cargo.toml`
-2. Commit: `jj new -m "bump: vX.Y.Z"`
-3. Tag: `jj git push && git tag vX.Y.Z && git push --tags`
+### Patterns to Avoid
 
-The release workflow handles the rest automatically.
+- Comments explaining what code does (code should be obvious)
+- Deeply nested code (use early returns)
+- Magic numbers (use named constants)
+- `clone()` when borrowing works
+- `Box<dyn Error>` (use `anyhow::Error`)
+- Panicking in library code