Skip to content

Capability namespaces & hierarchical discovery #45

@dgenio

Description

@dgenio

Milestone: v0.3.0 | Tier: Strategic | Effort: Medium–Large

Problem

CapabilityRegistry.search() uses flat keyword matching (token overlap scoring). With the MCP ecosystem's 100s–1000s of tools, this doesn't scale:

  • No namespace support — all capabilities live in a flat dict
  • No way to list/filter by domain (e.g., "show me all billing tools")
  • Search quality degrades with large registries (keyword overlap is noisy)
  • No deferred/lazy registration — all capabilities must be fully registered upfront

OpenAI's tool_search pattern recommends ≤20 tools loaded initially with deferred loading. MCP uses namespaced tool names. Agent-kernel needs equivalent capability.

Proposed Change

1. Namespace support

Support dot-notation namespaces in capability_id:

# Registration
registry.register(Capability(capability_id="billing.invoices.list", ...))
registry.register(Capability(capability_id="billing.invoices.create", ...))
registry.register(Capability(capability_id="crm.contacts.search", ...))

# Namespace operations
registry.list_namespaces()               # ["billing", "crm"]
registry.list_namespace("billing")       # [billing.invoices.list, billing.invoices.create]
registry.list_namespace("billing.invoices")  # Same as above

2. Improved search scoring

Replace token-overlap scoring with TF-IDF or BM25-inspired scoring:

  • Weight matches in name and tags higher than description
  • Penalize very common terms (stop words)
  • Support exact-match boosting for capability_id prefixes

3. Deferred registration

# Register namespace metadata without full implementations
registry.register_namespace(
    prefix="billing",
    description="Billing and invoicing tools",
    loader=lambda: discover_billing_capabilities(),  # Called on first access
)

# First search/access to "billing.*" triggers the loader
results = registry.search("list invoices")  # Triggers billing loader

4. Search pagination

results = registry.search("invoices", max_results=10, offset=0)

Acceptance Criteria

  • Dot-notation namespaces work for capability IDs
  • list_namespaces() returns top-level namespace prefixes
  • list_namespace(prefix) returns all capabilities under a namespace
  • Search with 500 registered capabilities returns relevant results in <10ms
  • Deferred namespace loader is called exactly once on first access
  • Search quality improvement measurable on a benchmark set (precision@5 > keyword baseline)
  • Backward compatible: existing flat capability IDs continue to work

Affected Files

  • src/agent_kernel/registry.py (namespace support, improved scoring, deferred loading)
  • src/agent_kernel/models.py (NamespaceMetadata dataclass if needed)
  • tests/test_registry.py (namespace, deferred, and search quality tests)
  • docs/capabilities.md (namespace documentation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions