Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
c22cf46
direnv
seatedro Dec 29, 2025
9496a36
feat(code): init new code feature
seatedro Dec 13, 2025
702a612
refactor: reorganize into workspace with core, fetch, tui, cli crates
seatedro Dec 25, 2025
89716e7
add import_query to registry.toml for all 11 languages
seatedro Dec 27, 2025
8ec39fe
add glimpse-code crate scaffolding with module stubs
seatedro Dec 27, 2025
80a38b4
feat: implement grammar loading for glimpse-code
seatedro Dec 27, 2025
725387f
refactor: consolidate code modules into 5 files
seatedro Dec 27, 2025
78f2b3e
feat: implement index storage with bincode serialization
seatedro Dec 27, 2025
b84e6d0
refactor: store index in local data dir instead of project
seatedro Dec 27, 2025
bb7bf79
feat: implement tree-sitter extraction for all languages
seatedro Dec 27, 2025
dba7540
feat: implement WorkspaceDiscovery trait and RustWorkspace with resol…
seatedro Dec 27, 2025
df1fe37
feat: add GoWorkspace and TsWorkspace with comprehensive tests
seatedro Dec 27, 2025
3df7f85
feat: add PythonWorkspace and import tracing to Resolver
seatedro Dec 27, 2025
9160fef
feat: replace rg subprocess with language-aware regex search
seatedro Dec 27, 2025
a74e036
feat: track discovered files for lazy index population
seatedro Dec 27, 2025
4fc28ec
feat: use grep crate for fast definition search, upgrade to rust nightly
seatedro Dec 28, 2025
22cbff2
refactor: stricter language-specific import resolution
seatedro Dec 29, 2025
20e19e1
feat: add zig, java, scala import resolvers with local package discovery
seatedro Dec 29, 2025
adfe3bf
refactor: simplify module resolution with universal glob-based search
seatedro Dec 29, 2025
7157bb3
fix: reorder resolution to check index first, track all discovered files
seatedro Dec 29, 2025
7ba06de
feat: implement call graph with traversal and transitive closure
seatedro Dec 29, 2025
8f3c45a
test: add integration tests for resolver and call graph
seatedro Dec 29, 2025
8b8e588
fix: resolve unindexed definitions via imports and add them to call g…
seatedro Dec 29, 2025
30247fa
refactor: flatten workspace into single crate
seatedro Dec 29, 2025
ca13923
feat: add code and index subcommands with progress indicators
seatedro Dec 29, 2025
0ab007c
fix: use ignore crate to respect gitignore when indexing
seatedro Dec 29, 2025
3253722
perf: remove slow resolve_by_search, use index-only lookups
seatedro Dec 29, 2025
b63f096
feat: add import-aware call resolution with --strict mode
seatedro Dec 29, 2025
7071834
feat: add LSP-based type resolution with --precise flag
seatedro Dec 29, 2025
28195ce
feat: add LSP support for zig, bash, java, and scala
seatedro Dec 29, 2025
cb55939
fix: add LSP warmup for reliable call resolution
seatedro Dec 29, 2025
718fcf6
chore: remove dead code and fix warnings
seatedro Dec 30, 2025
24c8698
feat: add LSP progress indicators, fix zig/clangd, add tracing
seatedro Dec 31, 2025
d708ecd
readme
seatedro Dec 31, 2025
e037c98
feat: add --hidden and --no-ignore flags to code/index subcommands
seatedro Dec 31, 2025
a469523
Merge branch 'master' into ro/rlrulpmmvqsm
seatedro Dec 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .envrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
use flake
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
/target
.direnv

test/
.todo.md
293 changes: 131 additions & 162 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,209 +1,178 @@
# AGENTS.md - AI Agent Guide for Glimpse
# Glimpse Development Guide

## Project Overview
A blazingly fast tool for peeking at codebases. Perfect for loading your codebase into an LLM's context.

Glimpse is a fast Rust CLI tool for extracting codebase content into LLM-friendly formats. It's designed to help users prepare source code for loading into Large Language Models with built-in token counting, tree visualization, and multiple output formats.
## Task Tracking

**Key capabilities:**
- Fast parallel file processing using Rayon
- Directory tree visualization
- Source code content extraction
- Token counting (tiktoken/HuggingFace backends)
- Git repository cloning and processing
- Web page scraping with Markdown conversion
- Interactive TUI file picker
- XML and PDF output formats
- Per-repository configuration via `.glimpse` files
Check `.todo.md` for current tasks and next steps. Keep it updated:
- Mark items `[x]` when completed
- Add new tasks as they're discovered
- Reference it before asking "what's next?"

## Codebase Structure
## Commits

```
glimpse/
├── src/
│ ├── main.rs # Entry point, CLI arg handling, routing
│ ├── cli.rs # CLI argument definitions using clap
│ ├── config.rs # Global and repo-level configuration
│ ├── analyzer.rs # Core file processing logic
│ ├── source_detection.rs # Source file detection (extensions, shebangs)
│ ├── output.rs # Output formatting (tree, files, XML, PDF)
│ ├── tokenizer.rs # Token counting backends
│ ├── git_processor.rs # Git repository cloning
│ ├── url_processor.rs # Web page fetching and HTML→Markdown
│ └── file_picker.rs # Interactive TUI file selector
├── build.rs # Build script that generates languages.rs from languages.yml
├── languages.yml # Language definitions (extensions, filenames, interpreters)
├── Cargo.toml # Dependencies and package metadata
├── .github/workflows/
│ ├── test.yml # CI: tests, clippy, formatting
│ └── release.yml # CD: multi-platform builds, publishing
└── test_project/ # Test fixtures
Use `jj` for version control. Always commit after completing a phase:

```bash
jj commit -m "feat: add glimpse-code crate scaffolding"
```

## Development Environment
Use conventional commit prefixes:
- `feat` - new feature
- `fix` - bug fix
- `refactor` - restructure without behavior change
- `chore` - maintenance, dependencies, config
- `docs` - documentation only
- `test` - adding or updating tests

Always use the devshell from the flake for all commands:
## Build Commands

```bash
nix develop
cargo build # debug build
cargo build --release # release build
cargo run -- <args> # run with arguments
cargo run -- . # analyze current directory
cargo run -- --help # show help
```

## Development Commands
## Test Commands

```bash
# Build and run
cargo build
cargo run -- [OPTIONS] [PATH]
cargo test # run all tests
cargo test test_name # run single test by name
cargo test test_name -- --nocapture # run test with stdout
cargo test -- --test-threads=1 # run tests sequentially
```

# Run tests
cargo test
## Lint & Format

# Check code quality (required to pass CI)
cargo clippy -- -D warnings
cargo fmt -- --check
```bash
cargo fmt # format all code
cargo fmt -- --check # check formatting (CI)
cargo clippy # run linter
cargo clippy -- -D warnings # fail on warnings (CI)
```

# Format code
cargo fmt
## Project Structure

# Build release
cargo build --release
```
glimpse/
├── src/
│ ├── main.rs # binary entry point
│ ├── lib.rs # library root
│ ├── cli.rs # CLI arg parsing
│ ├── analyzer.rs # directory processing
│ ├── output.rs # output formatting
│ ├── core/ # config, tokenizer, types, source detection
│ ├── fetch/ # git clone, url/html processing
│ ├── tui/ # file picker
│ └── code/ # code analysis (extract, graph, index, resolve)
├── tests/ # integration tests
├── languages.yml # language definitions for source detection
├── registry.toml # tree-sitter grammar registry
└── build.rs # generates language data from languages.yml
```

## Key Architecture Decisions
## Code Style

### Source File Detection
Detection happens in `source_detection.rs` via `is_source_file()`:
1. Check known filenames (Makefile, Dockerfile, etc.)
2. Check file extensions against `SOURCE_EXTENSIONS`
3. Fall back to shebang parsing for scripts
### No Comments

Extension/filename data is code-generated at build time from `languages.yml` via `build.rs`.
Code should be self-documenting. The only acceptable documentation is:
- Brief `///` docstrings on public API functions that aren't obvious
- `//!` module-level docs when necessary

### Include/Exclude Pattern Behavior
- `--include` (or `-i`): **Additive** - patterns are added to default source detection
- `--only-include`: **Replacement** - only specified patterns are used, ignoring source detection
- `--exclude` (or `-e`): Applied after inclusion, works with both modes
```rust
// BAD: explaining what code does
// Check if the file is a source file
if is_source_file(path) { ... }

### Token Counting
Two backends available in `tokenizer.rs`:
- `TokenizerType::Tiktoken` (default) - Uses `tiktoken-rs` for OpenAI-compatible counting
- `TokenizerType::HuggingFace` - Uses `tokenizers` crate for HuggingFace models
// BAD: inline comments
let name = path.file_name(); // get the filename

### Configuration Hierarchy
1. Global config: `~/.config/glimpse/config.toml` (Linux/macOS) or `%APPDATA%\glimpse\config.toml` (Windows)
2. Repo config: `.glimpse` file in project root
3. CLI arguments (highest priority)
// GOOD: self-documenting code, no comments needed
if is_source_file(path) { ... }

### Output Formats
- Default: Copies to clipboard
- `-p/--print`: Outputs to stdout
- `-f/--file [PATH]`: Writes to file (default: `GLIMPSE.md`)
- `-x/--xml`: Wraps output in XML tags for better LLM parsing
- `--pdf PATH`: Generates PDF output
// GOOD: docstring for non-obvious public function
/// Extract interpreter from shebang line and exec pattern
fn extract_interpreter(data: &str) -> Option<String> { ... }
```

## Testing Conventions
### Import Order

- Unit tests are co-located with source code in `#[cfg(test)]` modules
- Integration tests use `tempfile` for isolated filesystem testing
- Tests should handle network-dependent operations gracefully (see `git_processor.rs` tests)
- Mock servers used for URL processing tests via `mockito`
Group imports in this order, separated by blank lines:
1. `std` library
2. External crates (alphabetical)
3. Internal crates - prefer `super::` over `crate::` when possible

Example test pattern:
```rust
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;

#[test]
fn test_feature() -> Result<()> {
let dir = tempdir()?;
// Test logic using temp directory
Ok(())
}
}
```
use std::fs;
use std::path::{Path, PathBuf};

## CI/CD Pipeline
use anyhow::Result;
use serde::{Deserialize, Serialize};

### Test Workflow (`.github/workflows/test.yml`)
Runs on push/PR to `master`:
- `cargo test --verbose`
- `cargo clippy -- -D warnings`
- `cargo fmt -- --check`
use super::types::FileEntry; // preferred for sibling modules
use crate::config::Config; // only when super:: won't reach
```

### Release Workflow (`.github/workflows/release.yml`)
Triggered by version tags (`v*`):
1. Creates GitHub release
2. Builds binaries for: `x86_64-unknown-linux-gnu`, `aarch64-apple-darwin`, `x86_64-pc-windows-msvc`
3. Uploads release assets
4. Updates Homebrew tap
5. Publishes to crates.io
### Error Handling

## Code Style Guidelines
- Use `anyhow::Result` for fallible functions
- Propagate errors with `?` operator
- Use `.expect("message")` only when failure is a bug
- Never use `.unwrap()` outside of tests
- Use `anyhow::bail!` for early returns with errors

- Write terse, self-commenting code
- Comments only on docstrings for functions
- Follow standard Rust formatting (`cargo fmt`)
- Use `anyhow::Result` for error handling in application code
- Prefer `?` operator over explicit `match` for error propagation
- Use `#[derive]` macros for common traits
### Naming Conventions

## Version Control
- `snake_case` for functions, methods, variables, modules
- `PascalCase` for types, traits, enums
- `SCREAMING_SNAKE_CASE` for constants
- Prefer descriptive names over abbreviations
- Boolean functions: `is_`, `has_`, `can_`, `should_`

Use jujutsu (`jj`) instead of git for all version control operations.
### Type Definitions

```bash
jj status
jj diff
jj new -m "message"
jj describe -m "message"
jj bookmark set <name>
jj git push
```
- Derive common traits: `Debug`, `Clone`, `Serialize`, `Deserialize`
- Put derives in consistent order
- Use `pub` sparingly - only what's needed

## Common Patterns

### Adding a new CLI option
1. Add field to `Cli` struct in `cli.rs` with appropriate `#[arg(...)]` attributes
2. Handle the option in `main.rs` routing logic
3. Update `RepoConfig` in `config.rs` if it should be saveable

### Adding file type support
Edit `languages.yml` to add extensions, filenames, or interpreters. The build script will regenerate detection code automatically.
```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FileEntry {
pub path: PathBuf,
pub content: String,
pub size: u64,
}
```

### Modifying output format
Edit `output.rs`:
- `generate_tree()` for tree structure
- `generate_files()` for file contents
- `generate_output()` orchestrates the full output
### Function Style

## Important Dependencies
- Keep functions focused and small
- Use early returns for guard clauses
- Prefer iterators and combinators over loops when clearer
- Use `impl Trait` for return types when appropriate

| Crate | Purpose |
|-------|---------|
| `clap` | CLI argument parsing with derive macros |
| `rayon` | Parallel file processing |
| `ignore` | .gitignore-aware file walking |
| `tiktoken-rs` | OpenAI tokenizer |
| `tokenizers` | HuggingFace tokenizer |
| `git2` | Git repository operations |
| `scraper` | HTML parsing for web processing |
| `ratatui` | Terminal UI for file picker |
| `arboard` | Clipboard access |
| `printpdf` | PDF generation |
### Testing

## Debugging Tips
- Tests live in `#[cfg(test)] mod tests` at bottom of file
- Use descriptive test names: `test_<what>_<condition>`
- Use `tempfile` for filesystem tests
- Group related assertions

- Use `--print` to see output directly instead of clipboard
- Use `--no-tokens` to skip tokenizer initialization during debugging
- For file selection issues, check `.gitignore` patterns with `--no-ignore`
- For hidden file issues, use `-H/--hidden`
### Patterns to Follow

## Version Bumping
- Use `Option` combinators: `.map()`, `.and_then()`, `.unwrap_or()`
- Use `Result` combinators: `.map_err()`, `.context()`
- Prefer `&str` over `String` in function parameters
- Use `impl AsRef<Path>` for path parameters when flexible
- Use builders for complex configuration

Version is defined in `Cargo.toml`. When releasing:
1. Update version in `Cargo.toml`
2. Commit: `jj new -m "bump: vX.Y.Z"`
3. Tag: `jj git push && git tag vX.Y.Z && git push --tags`
### Patterns to Avoid

The release workflow handles the rest automatically.
- Comments explaining what code does (code should be obvious)
- Deeply nested code (use early returns)
- Magic numbers (use named constants)
- `clone()` when borrowing works
- `Box<dyn Error>` (use `anyhow::Error`)
- Panicking in library code
Loading
Loading