Personal tool combining deep code intelligence with specialized AI subagents. CLI-first approach with optional integrations later.
Core capabilities:
- Multi-language code analysis (TypeScript, Go, Python, Rust)
- Semantic + structural search
- Specialized agents (Planner, Explorer, PR Manager)
- GitHub integration
┌─────────────────────────────────────────────┐
│ CLI Interface │
│ (Beautiful output, JSON mode) │
└────────────────┬────────────────────────────┘
│
┌──────────┴──────────┐
│ │
┌─────▼──────┐ ┌────────▼─────────┐
│ Intelligence│ │ Subagents │
│ Layer │◄───┤ Layer │
│ │ │ │
│ • Scanner │ │ • Coordinator │
│ • Embedder │ │ • Planner │
│ • Vectors │ │ • Explorer │
│ • Indexer │ │ • PR Manager │
└────────────┘ └─────────────────┘
Key insight: Subagents use the intelligence layer to be smart about code.
Problem: Need multi-language support with varying depth.
Solution: Hybrid approach
- tree-sitter: Universal parser (Go, Python, Rust) - syntax-level
- ts-morph: Enhanced TypeScript scanner - types, references
- remark: Markdown documentation
Trade-off: More complexity, but gets us real multi-language support.
Problem: Need embeddings, but want local-first.
Options considered:
- TensorFlow.js: Older, limited models
- OpenAI API: Best quality but requires API key
- @xenova/transformers: Modern, all-MiniLM-L6-v2
Choice: @xenova/transformers
- Local (no API keys)
- Good quality (384 dims)
- Active development
Problem: Need vector storage without running a server.
Options considered:
- ChromaDB: Requires server process
- FAISS: Python-focused
- In-memory: Doesn't persist
- LanceDB: Embedded, TypeScript-native
Choice: LanceDB (embedded, no server)
Problem: MCP is only 1 month old, uncertain adoption.
Solution: Build CLI core, add integrations later
- CLI works immediately
- Can add MCP/VS Code/API when mature
- JSON output enables scripting
interface Scanner {
readonly language: string;
readonly capabilities: ScannerCapabilities;
scan(files: string[]): Promise<Document[]>;
}
interface ScannerCapabilities {
syntax: boolean; // All scanners
types?: boolean; // TypeScript only (for now)
references?: boolean; // Cross-file refs
documentation?: boolean; // Doc comments
}
interface Document {
id: string; // file:name:line
text: string; // Text to embed
type: 'function' | 'class' | 'interface' | 'doc';
language: string;
metadata: {
file: string;
startLine: number;
endLine: number;
name?: string;
signature?: string;
exported: boolean;
};
}Implementations:
TreeSitterScanner- Base for Go, Python, RustTypeScriptScanner- Enhanced with ts-morphMarkdownScanner- Documentation via remark
interface EmbeddingProvider {
getDimension(): number;
embed(text: string): Promise<number[]>;
embedBatch(texts: string[]): Promise<number[][]>;
}
interface VectorStore {
initialize(): Promise<void>;
upsert(items: VectorItem[]): Promise<void>;
search(query: number[], options: SearchOptions): Promise<SearchResult[]>;
}Why pluggable? Technology changes fast. Easy to swap:
- Embedders: transformers → OpenAI → Ollama
- Stores: LanceDB → ChromaDB → in-memory (testing)
interface RepositoryIndexer {
index(path: string): Promise<IndexStats>;
update(path: string): Promise<IndexStats>; // Incremental
search(query: string): Promise<SearchResult[]>;
}Flow:
- Walk file tree
- Detect language → select scanner
- Extract Documents
- Batch embed → store vectors
- Track metadata for incremental updates
interface Subagent {
initialize(options: SubagentOptions): Promise<void>;
handleMessage(message: SubagentMessage): Promise<SubagentMessage | null>;
}
// Coordinator manages agent lifecycle
interface SubagentCoordinator {
registerAgent(agent: Subagent): void;
allocateTask(task: Task): Promise<string>;
routeMessage(message: SubagentMessage): Promise<void>;
}Learns from claude-flow:
- Message passing patterns
- Error handling
- Task allocation
Our specialization:
- Code-specific task types
- Enriches messages with code context
- GitHub-aware coordination
dev-agent index # Index current repo
dev-agent search "query" # Semantic search
dev-agent scan # Show structure
dev-agent plan --issue 42 # Planner subagent
dev-agent explore "patterns" # Explorer subagent
dev-agent pr create # PR subagent
dev-agent search "query" --json # JSON outputInspired by: gh, ripgrep, eza, claude CLI
- Fast feedback - Show progress for long operations
- Clear output - Colors, icons, formatting
- Helpful errors - Suggest fixes, not just error codes
- Discoverable - Good
--helpand examples
commander.js- Command parsingchalk- Terminal colorsora- Elegant spinnerscli-table3- Pretty tablesboxen- Styled boxes
- TypeScript (strict mode)
- Node.js >= 22 LTS
- pnpm 8.15.4
- Turborepo
- tree-sitter (multi-language parsing)
- ts-morph (TypeScript analysis)
- remark (Markdown)
- @xenova/transformers (embeddings)
- LanceDB (vector storage)
- GitHub CLI (GitHub operations)
- Commander.js + chalk/ora/cli-table3
- Biome (linting/formatting)
- Vitest (testing)
- GitHub Actions (CI/CD)
Goal: Can index and search codebases
-
Issue #3: Scanner (2 weeks)
- Tree-sitter base scanner
- TypeScript scanner (ts-morph)
- Markdown scanner (remark)
- Scanner registry
-
Issue #4: Vector Storage (2 weeks)
- TransformersEmbedder (all-MiniLM-L6-v2)
- LanceDBVectorStore
- InMemoryVectorStore (testing)
-
Issue #12: Indexer (2 weeks)
- Wire scanner + embedder + storage
- Incremental indexing (file hashes)
- Batch processing
-
Issue #6: CLI (2 weeks)
- Command structure
- Beautiful output (colors, spinners, tables)
- JSON mode
- Help text
Deliverable: dev-agent index and dev-agent search work beautifully
Goal: Add specialized agents
-
Issue #7: Coordinator (1 week)
- Agent registry
- Message passing
- Task allocation
-
Issue #8: Planner (2 weeks)
- GitHub issue analysis
- Task breakdown using code context
- Plan output
-
Issue #9: Explorer (2 weeks)
- Pattern discovery
- Relationship mapping
- Similar code identification
-
Issue #10: PR Manager (1 week)
- Branch management
- PR creation with AI descriptions
- GitHub CLI integration
Deliverable: dev-agent plan, dev-agent explore, dev-agent pr work
Goal: If CLI proves useful, add integrations
- MCP Server (when protocol matures)
- VS Code Extension (for better UX)
- REST API (for custom tooling)
Don't build until Phase 1 & 2 prove valuable!
{
"embedder": "transformers",
"vectorStore": {
"type": "lancedb",
"path": ".dev-agent/vectors"
},
"exclude": [
"node_modules",
"dist",
"build",
".git"
],
"languages": ["typescript", "javascript", "go", "python", "rust", "markdown"]
}.dev-agent/
├── config.json # User configuration
├── vectors/ # LanceDB storage
├── cache/ # Model cache
└── logs/ # Debug logs
- Local-first - Works offline, no API keys required
- Pluggable - Swap embedders, scanners, stores easily
- Multi-language - Go, Python, Rust, not just TypeScript
- CLI-first - Beautiful terminal UX
- Fast - Search <100ms, index efficiently
- Build for daily use - If you don't use it, it's not worth building
- Scanner implementations (tree-sitter, ts-morph)
- Embedder implementations
- Vector store implementations
- End-to-end indexing flow
- Search quality tests
- Subagent coordination
- Use on dev-agent itself (dogfooding)
- Test on real multi-language repos
- CLI UX testing (is it actually nice to use?)
- Core logic: >80%
- CLI commands: >60%
- Integration tests: Critical paths only
- Indexing: 10k files in <5 minutes
- Search: <100ms for repos <1k files, <500ms for larger
- Memory: <500MB for typical repos
- Startup: CLI responds in <200ms
If the tool proves useful:
- Go: Add type analysis (go/types)
- Python: Type hints via mypy
- Rust: Trait/ownership analysis
- Reviewer: Code review suggestions
- Migrator: Help with refactors
- Documenter: Generate docs
- Tester: Suggest test cases
- MCP Server: For Claude Code/Cursor
- VS Code Extension: Native IDE experience
- REST API: For custom tooling
Build these only if needed!