Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions .claude/skills/address-pr-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,18 +79,20 @@ python .claude/skills/address-pr-review/scripts/fetch_comments.py <PR> --all

| Phase | Actions |
|-------|---------|
| **Fetch** | Run `--summary` first to see counts<br>Then `--id <ID>` for each comment to analyze<br>Exit if no unresolved comments |
| **Fetch** | Run `--summary` first to see counts<br>**Only process unresolved comments** — resolved ones are already closed, skip them<br>Then `--id <ID>` for each unresolved comment to analyze<br>Exit if no unresolved comments |
| **Per Comment** | Show: file:line, author, comment, ±10 lines context<br>Analyze: Valid/Nitpick/Disagree/Question<br>Recommend: Fix/Reply/Skip with reasoning |
| **Fix** | Minimal changes per llm/rules-*.md<br>Offer reply draft: `Fixed: [what]. [why]`<br>Show: `gh api --method POST repos/{owner}/{repo}/pulls/comments/$ID/replies -f body="..."` |
| **Reply** | Draft based on type: Question/Suggestion/Disagreement<br>Let user edit<br>Show gh command (never auto-post) |
| **Fix** | Minimal changes per llm/rules-*.md<br>Do NOT reply — just fix the code |
| **Reply** | Draft based on type: Question/Suggestion/Disagreement<br>Wait 2 minutes between each reply<br>Post with: `gh api --method POST repos/{owner}/{repo}/pulls/{PR}/comments -f body="..." -F in_reply_to=<ID>`<br>(never auto-post without user confirmation) |
| **Summary** | Processed X/N: Fixed Y, Replied Z, Skipped W<br>List: files modified, reply drafts, next steps |

## Critical Principles

| Principle | Violation Pattern |
|-----------|-------------------|
| **Unresolved only** | Processing already-resolved comments — the script default filters to unresolved; never re-open resolved threads |
| **Analyze first** | Accepting all feedback as valid without critical analysis |
| **Never auto-post** | Posting replies automatically instead of showing gh command |
| **Never auto-post** | Posting replies automatically without user confirmation or skipping 2-minute wait between replies |
| **No reply on fix** | Replying to comments that were addressed with a code fix — fixes speak for themselves |
| **One at a time** | Batch processing all comments without individual analysis |
| **Show context** | Making changes without displaying ±10 lines around code |
| **Minimal changes** | Large refactors in response to small comments |
Expand Down
54 changes: 54 additions & 0 deletions .claude/skills/creating-pipeline-templates/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,58 @@ StructureSampler → SemanticInfiller → DuplicateRemover

# generation + metrics
StructuredGenerator → FieldMapper → RagasMetrics

# generation + review-friendly output
StructuredGenerator → FieldMapper (flatten for review)
```

## Adding a FieldMapper for Review

The Review page displays records from the **last block's accumulated_state**. Only **first-level keys** are shown as primary/secondary fields. Nested objects (e.g. `generated.confirmed_dependencies`) appear as raw JSON strings and can't be configured as separate review fields.

**Always add a `FieldMapper` as the last block** to surface the fields reviewers need at the top level.

### Why it matters

Without a FieldMapper, the accumulated_state after a `StructuredGenerator` looks like:
```json
{
"input_field": "...",
"generated": {
"question": "...",
"answer": "...",
"contexts": ["..."]
}
}
```
The review UI sees `input_field` and `generated` (a blob). Reviewers can't configure `question` or `answer` as primary fields.

### How to add it

Add a `FieldMapper` as the **last block** (or last before metrics/observability blocks):

```yaml
- type: FieldMapper
config:
mappings:
# Flatten nested fields to top level
question: "{{ generated.question }}"
answer: "{{ generated.answer }}"
# tojson is safe only for structured data (IDs, numbers, short labels)
# avoid tojson on arrays/objects with free-text — newlines/quotes break JSON parsing
context_count: "{{ generated.contexts | length }}"
# Carry forward useful seed metadata
source: "{{ source_document }}"
```

### Rules

1. **Map every field the reviewer needs** — if it's not a first-level key after the last block, it won't be configurable in the review field settings
2. **Use `| tojson`** for arrays/objects — FieldMapper auto-parses JSON strings back to objects, so the review UI can display them properly. **Exception:** `tojson` on arrays/objects whose values contain unescaped quotes or newlines (e.g. free-text descriptions) will break FieldMapper JSON parsing. In that case, map only scalar summaries (counts, IDs) and let the array flow through as an existing first-level key.
3. **Use `| length`** for counts — gives reviewers a quick numeric summary without expanding lists
4. **Use `| default('')`** for optional fields — prevents Jinja2 errors when a field is missing
5. **Don't map internal/noisy fields** — skip `folder_path`, `_usage`, `_seed_samples` etc. Only map what's useful for human review
6. **Order matters** — FieldMapper outputs merge into accumulated_state, so its keys become the available fields in the Review "Configure Fields" modal

## Step-by-Step Workflow

Expand Down Expand Up @@ -126,6 +177,8 @@ StructuredGenerator → FieldMapper → RagasMetrics
| Missing seed variable referenced in prompt | Add the variable to seed metadata |
| MarkdownMultiplierBlock not first | Multiplier blocks must always be first |
| Seed file not named `seed_<template_id>.*` | Template ID must match: `foo.yaml` → `seed_foo.json` |
| Nested fields not visible in Review UI | Add a `FieldMapper` as last block to flatten nested outputs to top-level keys |
| Review shows `generated` as a JSON blob | Map individual sub-fields: `question: "{{ generated.question }}"` |

## Checklist

Expand All @@ -135,6 +188,7 @@ StructuredGenerator → FieldMapper → RagasMetrics
- [ ] Single execution produces expected output fields
- [ ] Trace shows all blocks executed successfully
- [ ] Seed file has 2-3 diverse examples
- [ ] FieldMapper as last block flattens outputs for Review UI (all reviewer-relevant fields are top-level keys)

## Related Skills

Expand Down
82 changes: 82 additions & 0 deletions .claude/skills/implementing-datagenflow-blocks/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,88 @@ async def execute(self, context: BlockExecutionContext) -> dict[str, Any]:
cached_embeddings = self._embeddings_cache[trace_id]
```

## Agentic Tool-Calling Block Pattern

For blocks that need multi-turn LLM reasoning with tool use (e.g. exploring an external data source before generating output):

```python
async def execute(self, context: BlockExecutionContext) -> dict[str, Any]:
from app import llm_config_manager

llm_config = await llm_config_manager.get_llm_model(self.model_name)
total_usage = pipeline.Usage(input_tokens=0, output_tokens=0, cached_tokens=0)

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": render_template(self.user_prompt, context.accumulated_state)},
]

for turn in range(self.max_turns):
if turn == self.max_turns - 1:
messages.append({"role": "user", "content": "Wrap up and return final JSON now."})

llm_params = llm_config_manager.prepare_llm_call(
llm_config,
messages=messages,
temperature=self.temperature,
max_tokens=self.max_tokens,
tools=TOOLS,
tool_choice="auto",
)
llm_params["metadata"] = {"trace_id": context.trace_id, "tags": ["datagenflow"]}

response = await litellm.acompletion(**llm_params)
msg = response.choices[0].message
total_usage.input_tokens += response.usage.prompt_tokens or 0
total_usage.output_tokens += response.usage.completion_tokens or 0
total_usage.cached_tokens += getattr(response.usage, "cache_read_input_tokens", 0) or 0

if not msg.tool_calls:
# final answer — parse JSON
try:
result = json.loads(msg.content or "{}")
except json.JSONDecodeError:
result = {}
return {"my_result": result.get("my_result", []), "_usage": total_usage.model_dump()}

# append assistant message and process tool calls
messages.append({"role": "assistant", "content": None, "tool_calls": [
{"id": tc.id, "type": "function", "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
for tc in msg.tool_calls
]})
for tc in msg.tool_calls:
try:
args = json.loads(tc.function.arguments)
except json.JSONDecodeError:
args = {}
# always use .get() — LLM may send malformed args
tool_result = _execute_tool(tc.function.name, args)
messages.append({"role": "tool", "tool_call_id": tc.id, "content": tool_result})

# max turns exhausted — force final answer without tools
messages.append({"role": "user", "content": "No more tool calls. Return final JSON NOW."})
llm_params = llm_config_manager.prepare_llm_call(
llm_config, messages=messages,
temperature=self.temperature, max_tokens=self.max_tokens,
)
llm_params["metadata"] = {"trace_id": context.trace_id, "tags": ["datagenflow"]}
response = await litellm.acompletion(**llm_params)
try:
result = json.loads(response.choices[0].message.content or "{}")
except json.JSONDecodeError:
result = {}
return {"my_result": result.get("my_result", []), "_usage": total_usage.model_dump()}
```

**Key rules:**
- Always nudge on last turn (`turn == max_turns - 1`) before the forced final call
- Always force a final call without tools when max_turns exhausted — otherwise you get no output
- Use `args.get("key", "")` not `args["key"]` — LLM may send malformed arguments
- If tool responses contain `"$ref"` keys, rename before sending: `output.replace('"$ref"', '"schema_ref"')` — Gemini rejects `$ref` in tool responses
- Cap tool result sizes (e.g. 50 items max) to avoid context overflow

---

## Multiplier Blocks

Blocks that generate multiple items from one input:
Expand Down
7 changes: 6 additions & 1 deletion .claude/skills/testing-pipeline-templates/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,12 @@ curl -s -X POST http://localhost:8000/api/pipelines/<ID>/execute \
- `trace` — each entry has `block_type`, `execution_time`, `output`
- `accumulated_state` — data flowing correctly between blocks?

**Red flags:** missing fields, metadata pollution (extra fields like `samples`, `target_count`), execution_time >30s, empty/null generator outputs.
**Check review readiness:**
- Look at the **last trace entry's `accumulated_state`** — these are the fields visible in the Review UI
- All reviewer-relevant fields should be **first-level keys** (not nested under `generated` or other objects)
- If useful fields are nested, add a `FieldMapper` as the last block to flatten them (see `creating-pipeline-templates` skill)

**Red flags:** missing fields, metadata pollution (extra fields like `samples`, `target_count`), execution_time >30s, empty/null generator outputs, reviewer-relevant data buried in nested objects.

## Phase 2: Small Batch

Expand Down
Loading
Loading