Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .flow/epics/fn-2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "fn-2",
"created_at": "2026-01-24T16:19:15.152423Z",
"depends_on_epics": [],
"id": "fn-2",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-2.md",
"status": "open",
"title": "GitHunter Toolset Completion",
"updated_at": "2026-01-24T16:19:49.637145Z"
}
13 changes: 13 additions & 0 deletions .flow/epics/fn-3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"branch_name": "fn-3",
"created_at": "2026-01-24T16:21:18.066966Z",
"depends_on_epics": [],
"id": "fn-3",
"next_task": 1,
"plan_review_status": "unknown",
"plan_reviewed_at": null,
"spec_path": ".flow/specs/fn-3.md",
"status": "open",
"title": "Forensic Features: Trace Persistence and Replay",
"updated_at": "2026-01-24T16:22:03.691848Z"
}
87 changes: 87 additions & 0 deletions .flow/specs/fn-2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# GitHunter Toolset Completion

## Overview

Complete the GitHunter toolset by adding `tools.py` that wraps `GitHunterAdapter` methods as PydanticAI `Tool` objects. This follows the existing patterns established in `memory/tools.py` and `schema/tools.py`.

## Scope

### In Scope
- Create request/response Pydantic models for tool inputs
- Wrap 3 adapter methods as PydanticAI tools: `blame_line`, `find_pr_discussion`, `get_expert_for_file`
- Export toolset as `githunter_toolset: list[Tool[GitHunterProtocol]]`
- Add comprehensive tests following existing test patterns
- Update `__init__.py` exports
- Add documentation to API reference

### Out of Scope
- GitLab/Bitbucket support (GitHub only)
- `enrich_author` as a separate tool (internal use only)
- Caching layer for repeated calls
- New adapter functionality

## Approach

Follow the established toolset pattern:

1. **Request Models** (`_models.py`): Create Pydantic models for tool inputs
- `BlameLineRequest(repo_path: str, file_path: str, line_no: int)`
- `FindPRDiscussionRequest(repo_path: str, commit_hash: str)`
- `GetExpertsRequest(repo_path: str, file_path: str, window_days: int = 90, limit: int = 3)`

2. **Tool Functions** (`tools.py`): Async functions with `RunContext[GitHunterProtocol]`
- Convert string repo_path to Path internally
- Catch adapter exceptions and convert to Error model returns
- Include "Agent Usage" docstrings

3. **Toolset Export**: `githunter_toolset: list[Tool[GitHunterProtocol]]`

## Key Files

| File | Purpose |
|------|---------|
| `src/bond/tools/githunter/_models.py` | NEW - Request models |
| `src/bond/tools/githunter/tools.py` | NEW - Tool functions + toolset export |
| `src/bond/tools/githunter/__init__.py` | Update exports |
| `tests/unit/tools/githunter/test_tools.py` | NEW - Tool tests |
| `docs/api/tools.md` | Update API docs |

## Reuse Points

- **Pattern**: `src/bond/tools/memory/tools.py` (lines 45-144) - Tool function structure
- **Pattern**: `src/bond/tools/schema/tools.py` - Simpler toolset example
- **Error model**: `src/bond/tools/memory/_models.py:177-187` - Error return type
- **Test pattern**: `tests/unit/tools/schema/test_tools.py` - Mock protocol pattern
- **Protocol**: `src/bond/tools/githunter/_protocols.py` - Already complete
- **Adapter**: `src/bond/tools/githunter/_adapter.py` - Already complete

## Quick Commands

```bash
# Run GitHunter tests
uv run pytest tests/unit/tools/githunter/ -v

# Type check
uv run mypy src/bond/tools/githunter/

# Lint
uv run ruff check src/bond/tools/githunter/
```

## Acceptance

- [ ] `_models.py` contains 3 request models with validation
- [ ] `tools.py` exports `githunter_toolset` with 3 tools
- [ ] All tools handle adapter exceptions gracefully (return Error, don't raise)
- [ ] Tests pass with MockGitHunter protocol implementation
- [ ] `mypy` and `ruff` pass without errors
- [ ] API docs updated with GitHunter toolset section
- [ ] Exports available: `from bond.tools.githunter import githunter_toolset`

## References

- Memory toolset pattern: `src/bond/tools/memory/tools.py`
- Schema toolset pattern: `src/bond/tools/schema/tools.py`
- GitHunter protocol: `src/bond/tools/githunter/_protocols.py:14-91`
- GitHunter adapter: `src/bond/tools/githunter/_adapter.py`
- PydanticAI Tool docs: https://ai.pydantic.dev/tools/
174 changes: 174 additions & 0 deletions .flow/specs/fn-3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Forensic Features: Trace Persistence and Replay

## Overview

Extend Bond's "Forensic Runtime" capabilities beyond real-time streaming to include trace persistence and replay. This enables:
- **Audit**: Review what an agent did hours/days ago
- **Debug**: Replay failed runs step-by-step
- **Compare**: Analyze different executions side-by-side

## Scope

### In Scope
- **Trace Capture**: Record all 8 StreamHandlers callback events with metadata
- **Storage Backend**: Pluggable backend interface with JSON file implementation
- **Replay API**: SDK method to iterate through stored events
- **Handler Factory**: `create_capture_handlers()` for easy capture setup

### Out of Scope (Future)
- Protobuf serialization (start with JSON for debugging)
- Remote storage backends (S3, database)
- Cross-trace querying and analytics
- Real-time trace streaming to external systems
- Automatic cleanup/retention policies
- UI replay interface (API only in this phase)

## Approach

### Phase 1: Event Model

Define a unified event structure that normalizes all 8 callback types:

```python
@dataclass(frozen=True)
class TraceEvent:
trace_id: str # UUID for this trace
sequence: int # Ordering within trace
timestamp: float # time.monotonic() for ordering
wall_time: datetime # Human-readable timestamp
event_type: str # "block_start", "text_delta", etc.
payload: dict[str, Any] # Event-specific data
```

Event types map to StreamHandlers:
| Callback | event_type | payload keys |
|----------|------------|--------------|
| on_block_start | "block_start" | kind, index |
| on_block_end | "block_end" | kind, index |
| on_text_delta | "text_delta" | text |
| on_thinking_delta | "thinking_delta" | text |
| on_tool_call_delta | "tool_call_delta" | name, args |
| on_tool_execute | "tool_execute" | id, name, args |
| on_tool_result | "tool_result" | id, name, result |
| on_complete | "complete" | data |

### Phase 2: Storage Backend Protocol

```python
@runtime_checkable
class TraceStorageProtocol(Protocol):
async def save_event(self, event: TraceEvent) -> None:
"""Append event to trace."""
...

async def finalize_trace(self, trace_id: str) -> None:
"""Mark trace as complete."""
...

async def load_trace(self, trace_id: str) -> AsyncIterator[TraceEvent]:
"""Load events for replay."""
...

async def list_traces(self, limit: int = 100) -> list[TraceMeta]:
"""List available traces."""
...
```

Initial implementation: `JSONFileTraceStore` writing to `.bond/traces/{trace_id}.json`

### Phase 3: Capture Handler Factory

```python
def create_capture_handlers(
storage: TraceStorageProtocol,
trace_id: str | None = None, # Auto-generate if None
) -> tuple[StreamHandlers, str]:
"""Create handlers that capture events to storage.

Returns:
(handlers, trace_id) - handlers for agent.ask(), and trace ID for later replay
"""
```

### Phase 4: Replay API

```python
class TraceReplayer:
def __init__(self, storage: TraceStorageProtocol, trace_id: str):
...

async def __aiter__(self) -> AsyncIterator[TraceEvent]:
"""Iterate through all events."""
...

async def step(self) -> TraceEvent | None:
"""Get next event (for manual stepping)."""
...

@property
def current_position(self) -> int:
"""Current event index."""
...
```

## Key Files

| File | Purpose |
|------|---------|
| `src/bond/trace/__init__.py` | NEW - Module exports |
| `src/bond/trace/_models.py` | NEW - TraceEvent, TraceMeta models |
| `src/bond/trace/_protocols.py` | NEW - TraceStorageProtocol |
| `src/bond/trace/backends/json_file.py` | NEW - JSON file storage |
| `src/bond/trace/capture.py` | NEW - create_capture_handlers |
| `src/bond/trace/replay.py` | NEW - TraceReplayer class |
| `src/bond/utils.py` | UPDATE - Add capture handler factory |
| `tests/unit/trace/` | NEW - Test directory |

## Reuse Points

- **Event structure**: Inspired by `create_websocket_handlers()` JSON format (`src/bond/utils.py:34-86`)
- **Protocol pattern**: Follow `src/bond/tools/memory/_protocols.py` style
- **Storage pattern**: Similar to `AgentMemoryProtocol` but for events

## Quick Commands

```bash
# Run trace tests
uv run pytest tests/unit/trace/ -v

# Type check
uv run mypy src/bond/trace/

# Example usage (after implementation)
python -c "
from bond.trace import JSONFileTraceStore, create_capture_handlers, TraceReplayer
store = JSONFileTraceStore()
handlers, trace_id = create_capture_handlers(store)
print(f'Trace ID: {trace_id}')
"
```

## Acceptance

- [ ] `TraceEvent` model captures all 8 callback types
- [ ] `TraceStorageProtocol` defines storage interface
- [ ] `JSONFileTraceStore` implements protocol with file-based storage
- [ ] `create_capture_handlers()` returns working StreamHandlers
- [ ] `TraceReplayer` can iterate through stored traces
- [ ] All tests pass with >80% coverage on trace module
- [ ] `mypy` and `ruff` pass
- [ ] Documentation added to architecture page

## Open Questions

1. **Trace directory**: Use `.bond/traces/` or configurable path?
2. **Large tool results**: Truncate at what size? 1MB? 10MB?
3. **Crash handling**: How to mark incomplete traces? Separate "status" field?
4. **Event ordering**: Use monotonic clock + sequence number for guaranteed order?

## References

- WebSocket handler pattern: `src/bond/utils.py:20-118`
- StreamHandlers dataclass: `src/bond/agent.py:28-73`
- Event sourcing StoredEvent: https://eventsourcing.readthedocs.io/
- OTel trace format: https://opentelemetry.io/docs/specs/semconv/gen-ai/
21 changes: 21 additions & 0 deletions .flow/tasks/fn-2.1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-24T16:43:07.860701Z",
"created_at": "2026-01-24T16:19:55.586842Z",
"depends_on": [],
"epic": "fn-2",
"evidence": {
"commits": [
"cdc4a2f"
],
"prs": [],
"tests": []
},
"id": "fn-2.1",
"priority": null,
"spec_path": ".flow/tasks/fn-2.1.md",
"status": "done",
"title": "Create GitHunter request models",
"updated_at": "2026-01-24T16:45:23.392733Z"
}
55 changes: 55 additions & 0 deletions .flow/tasks/fn-2.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# fn-2.1 Create GitHunter request models

## Description
Create Pydantic request models for GitHunter tools in `src/bond/tools/githunter/_models.py`.

### Models to Create

```python
class BlameLineRequest(BaseModel):
"""Request for blame_line tool."""
repo_path: str # String, converted to Path in tool
file_path: str
line_no: int = Field(ge=1, description="Line number (1-indexed)")

class FindPRDiscussionRequest(BaseModel):
"""Request for find_pr_discussion tool."""
repo_path: str
commit_hash: str = Field(min_length=7, description="Full or abbreviated SHA")

class GetExpertsRequest(BaseModel):
"""Request for get_expert_for_file tool."""
repo_path: str
file_path: str
window_days: int = Field(default=90, ge=0, description="Days of history (0=all time)")
limit: int = Field(default=3, ge=1, le=10, description="Max experts to return")
```

### Also Add

- `Error` model following `memory/_models.py:177-187` pattern
- Union types for return values: `BlameResult | Error`, etc.

### Reference Files

- Pattern: `src/bond/tools/memory/_models.py`
- Types: `src/bond/tools/githunter/_types.py` (BlameResult, PRDiscussion, FileExpert)
## Acceptance
- [ ] `_models.py` exists with BlameLineRequest, FindPRDiscussionRequest, GetExpertsRequest
- [ ] All models have Field validators (ge, min_length, etc.)
- [ ] Error model exists for union return types
- [ ] `mypy src/bond/tools/githunter/_models.py` passes
- [ ] `ruff check src/bond/tools/githunter/_models.py` passes
## Done summary
Created _models.py with GitHunter request models:
- BlameLineRequest (repo_path, file_path, line_no with ge=1 validator)
- FindPRDiscussionRequest (repo_path, commit_hash with min_length=7 validator)
- GetExpertsRequest (repo_path, file_path, window_days=90 default, limit=3 default)
- Error model for union return types in tool responses

All models follow the Annotated[..., Field(...)] pattern from memory/_models.py.
Passed mypy and ruff checks.
## Evidence
- Commits: cdc4a2f
- Tests:
- PRs:
23 changes: 23 additions & 0 deletions .flow/tasks/fn-2.2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"assignee": "bordumbb@gmail.com",
"claim_note": "",
"claimed_at": "2026-01-24T16:45:49.269695Z",
"created_at": "2026-01-24T16:20:03.882583Z",
"depends_on": [
"fn-2.1"
],
"epic": "fn-2",
"evidence": {
"commits": [
"2d87f6a"
],
"prs": [],
"tests": []
},
"id": "fn-2.2",
"priority": null,
"spec_path": ".flow/tasks/fn-2.2.md",
"status": "done",
"title": "Implement GitHunter tool functions",
"updated_at": "2026-01-24T16:46:36.650352Z"
}
Loading