Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions skills/openclaw-native/tool-description-optimizer/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
name: tool-description-optimizer
version: "1.0"
category: openclaw-native
description: Analyzes skill descriptions for trigger quality — scores clarity, keyword density, and specificity, then suggests rewrites that improve discovery accuracy.
stateful: true
---

# Tool Description Optimizer

## What it does

A skill's description is its only discovery mechanism. If the description is vague, overlapping, or keyword-poor, the agent won't trigger it — or worse, will trigger the wrong skill. Tool Description Optimizer analyzes every installed skill's description for trigger quality and suggests concrete rewrites.

Inspired by OpenLobster's tool-description scoring layer, which penalizes vague descriptions and rewards keyword-rich, action-specific ones.

## When to invoke

- After installing new skills — check if descriptions are trigger-ready
- When a skill isn't firing when expected — diagnose whether the description is the problem
- Periodically to audit all descriptions for quality drift
- Before publishing a skill — polish the description for discoverability

## How it works

### Scoring dimensions (5 metrics, 0–10 each)

| Metric | What it measures | Weight |
|---|---|---|
| Clarity | Single clear purpose, no ambiguity | 2x |
| Specificity | Action verbs, concrete nouns vs. vague terms | 2x |
| Keyword density | Trigger-relevant keywords per sentence | 1.5x |
| Uniqueness | Low overlap with other installed skill descriptions | 1.5x |
| Length | Optimal range (15–40 words) — too short = vague, too long = diluted | 1x |

### Quality grades

| Grade | Score range | Meaning |
|---|---|---|
| A | 8.0–10.0 | Excellent — high trigger accuracy expected |
| B | 6.0–7.9 | Good — minor improvements possible |
| C | 4.0–5.9 | Fair — likely to miss triggers or overlap |
| D | 2.0–3.9 | Poor — needs rewrite |
| F | 0.0–1.9 | Failing — will not trigger reliably |

## How to use

```bash
python3 optimize.py --scan # Score all installed skills
python3 optimize.py --scan --grade C # Only show skills graded C or below
python3 optimize.py --skill <name> # Deep analysis of a single skill
python3 optimize.py --suggest <name> # Generate rewrite suggestions
python3 optimize.py --compare "desc A" "desc B" # Compare two descriptions
python3 optimize.py --status # Last scan summary
python3 optimize.py --format json # Machine-readable output
```

## Procedure

**Step 1 — Run a full scan**

```bash
python3 optimize.py --scan
```

Review the scorecard. Focus on skills graded C or below — these are the ones most likely to cause trigger failures.

**Step 2 — Get rewrite suggestions for low-scoring skills**

```bash
python3 optimize.py --suggest <skill-name>
```

The optimizer generates 2–3 alternative descriptions with predicted score improvements.

**Step 3 — Compare alternatives**

```bash
python3 optimize.py --compare "original description" "suggested rewrite"
```

Side-by-side scoring shows exactly which metrics improved.

**Step 4 — Apply the best rewrite**

Edit the skill's `SKILL.md` frontmatter `description:` field with the chosen rewrite.

## Vague word penalties

These words score 0 on specificity — they say nothing actionable:

`helps`, `manages`, `handles`, `deals with`, `works with`, `does stuff`, `various`, `things`, `general`, `misc`, `utility`, `tool for`, `assistant for`

## Strong trigger keywords (examples)

`scans`, `detects`, `validates`, `generates`, `audits`, `monitors`, `checks`, `reports`, `fixes`, `migrates`, `syncs`, `schedules`, `blocks`, `scores`, `diagnoses`

## State

Scan results and per-skill scores stored in `~/.openclaw/skill-state/tool-description-optimizer/state.yaml`.

Fields: `last_scan_at`, `skill_scores` list, `scan_history`.

## Notes

- Does not modify any skill files — analysis and suggestions only
- Uniqueness scoring uses Jaccard similarity against all other installed descriptions
- Length scoring uses a bell curve centered at 25 words (optimal)
- Rewrite suggestions are heuristic-based, not LLM-generated — deterministic and fast
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
version: "1.0"
description: Tool description quality scores, rewrite suggestions, and scan history.
fields:
last_scan_at:
type: datetime
skill_scores:
type: list
description: Per-skill quality scores from the most recent scan
items:
skill_name: { type: string }
description: { type: string }
word_count: { type: integer }
clarity: { type: float, description: "0-10 clarity score" }
specificity: { type: float, description: "0-10 specificity score" }
keyword_density: { type: float, description: "0-10 keyword density score" }
uniqueness: { type: float, description: "0-10 uniqueness vs other skills" }
length_score: { type: float, description: "0-10 length optimality score" }
overall: { type: float, description: "Weighted composite score" }
grade: { type: string, description: "A/B/C/D/F" }
scan_history:
type: list
description: Rolling log of past scans (last 20)
items:
scanned_at: { type: datetime }
skills_scanned: { type: integer }
avg_score: { type: float }
grade_distribution: { type: object, description: "Count per grade: A, B, C, D, F" }
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Example runtime state for tool-description-optimizer
last_scan_at: "2026-03-16T14:00:05.221000"
skill_scores:
- skill_name: using-superpowers
description: "Bootstrap — teaches the agent how to find and invoke skills"
word_count: 11
clarity: 7.2
specificity: 3.8
keyword_density: 3.3
uniqueness: 8.1
length_score: 4.8
overall: 5.6
grade: C
- skill_name: config-encryption-auditor
description: "Scans OpenClaw config directories for plaintext API keys, tokens, and secrets in unencrypted files."
word_count: 15
clarity: 9.2
specificity: 8.5
keyword_density: 8.0
uniqueness: 9.0
length_score: 7.5
overall: 8.5
grade: A
- skill_name: memory-graph-builder
description: "Parses MEMORY.md into a knowledge graph with typed relationships, detects duplicates and contradictions, and generates a compressed memory digest."
word_count: 22
clarity: 8.8
specificity: 7.6
keyword_density: 7.2
uniqueness: 9.4
length_score: 9.5
overall: 8.5
grade: A
scan_history:
- scanned_at: "2026-03-16T14:00:05.221000"
skills_scanned: 40
avg_score: 7.2
grade_distribution:
A: 18
B: 14
C: 6
D: 2
F: 0
- scanned_at: "2026-03-13T14:00:00.000000"
skills_scanned: 36
avg_score: 6.8
grade_distribution:
A: 14
B: 12
C: 7
D: 3
F: 0
# ── Walkthrough ──────────────────────────────────────────────────────────────
# python3 optimize.py --scan
#
# Tool Description Quality Scan — 2026-03-16
# ────────────────────────────────────────────────────────────
# 40 skills scanned | avg score: 7.2
# Grades: 18xA 14xB 6xC 2xD 0xF
#
# ! [D] 3.8 — some-vague-skill
# clarity=2.0 spec=1.5 kw=1.2 uniq=8.0 len=6.5
# "A helpful utility tool that manages various things..."
#
# ~ [C] 5.6 — using-superpowers
# clarity=7.2 spec=3.8 kw=3.3 uniq=8.1 len=4.8
# "Bootstrap — teaches the agent how to find and invoke skills"
#
# python3 optimize.py --suggest using-superpowers
#
# Rewrite Suggestions: using-superpowers
# ──────────────────────────────────────────────────
# Current: "Bootstrap — teaches the agent how to find and invoke skills"
# Score: 5.6 (C)
#
# 1. Front-load action verb
# "Teaches the agent how to discover, invoke, and chain installed skills"
# Predicted: 7.4 (B) [+1.8]
#
# python3 optimize.py --compare "A tool that helps manage stuff" "Scans config files for plaintext secrets and suggests env var migration"
#
# Description Comparison
# ──────────────────────────────────────────────────
# A: "A tool that helps manage stuff"
# B: "Scans config files for plaintext secrets and suggests env var migration"
#
# Clarity A=2.0 B=9.5 B
# Specificity A=0.0 B=8.5 B
# Keywords A=0.0 B=7.8 B
# Uniqueness A=7.0 B=7.0 =
# Length A=5.2 B=8.8 B
# OVERALL A=2.8 B=8.4 B
#
# Grade: A=D B=A
Loading
Loading