-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Is this related to an existing feature request or issue?
Based on the existing AWS Observability Kiro Power, adapted into the agent-plugins marketplace format.
Summary
This RFC proposes a new aws-observability plugin that provides a comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis. The plugin integrates four MCP servers from AWS Labs and provides eight reference files covering incident response, log analysis, alerting setup, performance monitoring, security auditing, observability gap analysis, Application Signals enablement, and CloudTrail data source selection.
Use case
AI coding agents today lack integrated access to AWS observability tooling. When developers need to troubleshoot production incidents, analyze logs, monitor performance, audit security events, or assess codebase observability gaps, they must manually switch between multiple AWS consoles and tools.
Key use cases:
- Incident response: Quickly triage production incidents by correlating alarms, logs, traces, metrics, and recent changes across CloudWatch, Application Signals, and CloudTrail
- Log analysis: Query CloudWatch Logs using Logs Insights syntax with pattern detection, anomaly analysis, and multi-log-group support
- Performance monitoring: Monitor microservices health via Application Signals APM with SLOs, distributed tracing, and service dependency maps
- Security auditing: Investigate security incidents and perform compliance audits using CloudTrail with a prioritized data source strategy (Lake > CloudWatch Logs > Lookup Events API)
- Alerting setup: Configure intelligent CloudWatch alarms using AWS best-practice recommendations with composite alarms and anomaly detection
- Observability gap analysis: Audit codebases across Python, Java, JavaScript/TypeScript, Go, Ruby, and C#/.NET for missing logging, metrics, tracing, error handling, and health checks
Proposal
Plugin structure
plugins/aws-observability/
├── .claude-plugin/
│ └── plugin.json # Plugin manifest
├── .mcp.json # 4 MCP server definitions
└── skills/
└── aws-observability/
├── SKILL.md # Main skill (~155 lines, auto-triggers)
└── references/
├── alerting-setup.md
├── application-signals-setup.md
├── cloudtrail-data-source-selection.md
├── incident-response.md
├── log-analysis.md
├── observability-gap-analysis.md
├── performance-monitoring.md
└── security-auditing.md
MCP servers
| Server | Type | Purpose |
|---|---|---|
awslabs.cloudwatch-mcp-server |
stdio | CloudWatch Logs, Metrics, Alarms, log group analysis |
awslabs.cloudwatch-applicationsignals-mcp-server |
stdio | Application Signals APM, SLOs, distributed tracing |
awslabs.cloudtrail-mcp-server |
stdio | CloudTrail security auditing, API activity tracking |
awslabs.aws-documentation-mcp-server |
stdio | Official AWS documentation search and access |
Skill design
The SKILL.md follows progressive disclosure:
- Initial load (~155 lines): Prerequisites, configuration, capability overview, reference file index with load conditions, quick start examples, essential log query patterns, and best practices
- On-demand references (8 files): Loaded only when the agent needs deep domain knowledge for a specific workflow (e.g., incident response, security auditing)
User experience
Before: Users must manually navigate AWS Console, run CLI commands, and context-switch between CloudWatch, X-Ray, CloudTrail, and documentation.
After: Users describe their intent naturally (e.g., "investigate the high error rate on my API", "audit my CloudTrail for IAM changes", "check my codebase for observability gaps") and the agent auto-triggers the aws-observability skill, loads relevant references, and uses the MCP servers to execute the workflow.
Prerequisites
- AWS CLI configured with credentials
- Python 3.10+ and
uvinstalled - Required IAM permissions:
cloudwatch:*,logs:*,xray:*,cloudtrail:*,application-signals:*,synthetics:Get*,s3:GetObject,s3:ListBucket,iam:Get*
Out of scope
- AWS resource provisioning or modification: This plugin is read-only for observability data; it does not create, modify, or delete AWS resources
- Custom dashboard creation: The plugin queries data but does not create CloudWatch Dashboards or other persistent UI artifacts
- Automated remediation: The plugin identifies issues and provides recommendations but does not automatically fix them
- Non-AWS observability platforms: Integration with Datadog, Splunk, Grafana, or other third-party monitoring tools
- Cost Explorer integration: While referenced in some workflows, Cost Explorer MCP server integration is not included in this initial version
Potential challenges
- IAM permissions breadth: The plugin requires broad permissions across CloudWatch, X-Ray, CloudTrail, Application Signals, and S3. Users with restricted IAM policies may encounter partial functionality. Mitigation: Clear prerequisites documentation and graceful error handling guidance in reference files.
- Reference file size: Some reference files (security-auditing.md, performance-monitoring.md, incident-response.md) exceed the 100-line guideline from DESIGN_GUIDELINES.md due to the breadth of query patterns and workflows. Mitigation: Content is organized with clear headings for selective loading; the SKILL.md itself stays well under 300 lines.
- MCP server availability: All four MCP servers are published on PyPI as
uvx-installable packages. If any server has breaking changes, the plugin may need updates. Mitigation: Using@latestversion pins for automatic updates. - Region and profile configuration: Default configuration uses
defaultAWS profile andus-east-1region. Users must manually update.mcp.jsonenv vars for different profiles/regions. Mitigation: Configuration section in SKILL.md provides clear instructions.
Dependencies and Integrations
Dependencies (all from AWS Labs):
- awslabs.cloudwatch-mcp-server - CloudWatch Logs, Metrics, Alarms
- awslabs.cloudwatch-applicationsignals-mcp-server - Application Signals APM
- awslabs.cloudtrail-mcp-server - CloudTrail auditing
- awslabs.aws-documentation-mcp-server - AWS documentation access
Integration with existing plugins:
- Complements
deploy-on-awsby providing post-deployment monitoring and troubleshooting capabilities - The CloudTrail security auditing capability pairs well with infrastructure changes made via the deploy plugin
Alternative solutions
-
Individual MCP server setup without a plugin: Users could manually configure each MCP server and write their own prompts. The plugin adds value through curated skill descriptions, progressive-disclosure reference files, and pre-built workflow patterns that guide the agent through complex multi-tool observability tasks.
-
Separate plugins per capability: Could split into aws-cloudwatch, aws-application-signals, aws-cloudtrail plugins. However, observability workflows frequently span multiple tools (e.g., incident response correlates alarms + logs + traces + CloudTrail changes), making a unified plugin more effective.
- RFC PR: feat(aws-observability): Add AWS Observability plugin #68
- Approved by: ''
- Reviewed by: ''