RFC: Add AWS Observability plugin

### Is this related to an existing feature request or issue?

Based on the existing [AWS Observability Kiro Power](https://github.com/kirodotdev/powers/tree/main/aws-observability), adapted into the agent-plugins marketplace format.

### Summary

This RFC proposes a new **aws-observability** plugin that provides a comprehensive AWS observability platform combining CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and automated codebase observability gap analysis. The plugin integrates four MCP servers from AWS Labs and provides eight reference files covering incident response, log analysis, alerting setup, performance monitoring, security auditing, observability gap analysis, Application Signals enablement, and CloudTrail data source selection.

### Use case

AI coding agents today lack integrated access to AWS observability tooling. When developers need to troubleshoot production incidents, analyze logs, monitor performance, audit security events, or assess codebase observability gaps, they must manually switch between multiple AWS consoles and tools.

**Key use cases:**
- **Incident response**: Quickly triage production incidents by correlating alarms, logs, traces, metrics, and recent changes across CloudWatch, Application Signals, and CloudTrail
- **Log analysis**: Query CloudWatch Logs using Logs Insights syntax with pattern detection, anomaly analysis, and multi-log-group support
- **Performance monitoring**: Monitor microservices health via Application Signals APM with SLOs, distributed tracing, and service dependency maps
- **Security auditing**: Investigate security incidents and perform compliance audits using CloudTrail with a prioritized data source strategy (Lake > CloudWatch Logs > Lookup Events API)
- **Alerting setup**: Configure intelligent CloudWatch alarms using AWS best-practice recommendations with composite alarms and anomaly detection
- **Observability gap analysis**: Audit codebases across Python, Java, JavaScript/TypeScript, Go, Ruby, and C#/.NET for missing logging, metrics, tracing, error handling, and health checks

### Proposal

#### Plugin structure

```
plugins/aws-observability/
├── .claude-plugin/
│   └── plugin.json            # Plugin manifest
├── .mcp.json                  # 4 MCP server definitions
└── skills/
    └── aws-observability/
        ├── SKILL.md           # Main skill (~155 lines, auto-triggers)
        └── references/
            ├── alerting-setup.md
            ├── application-signals-setup.md
            ├── cloudtrail-data-source-selection.md
            ├── incident-response.md
            ├── log-analysis.md
            ├── observability-gap-analysis.md
            ├── performance-monitoring.md
            └── security-auditing.md
```

#### MCP servers

| Server | Type | Purpose |
|--------|------|---------|
| `awslabs.cloudwatch-mcp-server` | stdio | CloudWatch Logs, Metrics, Alarms, log group analysis |
| `awslabs.cloudwatch-applicationsignals-mcp-server` | stdio | Application Signals APM, SLOs, distributed tracing |
| `awslabs.cloudtrail-mcp-server` | stdio | CloudTrail security auditing, API activity tracking |
| `awslabs.aws-documentation-mcp-server` | stdio | Official AWS documentation search and access |

#### Skill design

The SKILL.md follows progressive disclosure:
- **Initial load** (~155 lines): Prerequisites, configuration, capability overview, reference file index with load conditions, quick start examples, essential log query patterns, and best practices
- **On-demand references** (8 files): Loaded only when the agent needs deep domain knowledge for a specific workflow (e.g., incident response, security auditing)

#### User experience

**Before**: Users must manually navigate AWS Console, run CLI commands, and context-switch between CloudWatch, X-Ray, CloudTrail, and documentation.

**After**: Users describe their intent naturally (e.g., "investigate the high error rate on my API", "audit my CloudTrail for IAM changes", "check my codebase for observability gaps") and the agent auto-triggers the aws-observability skill, loads relevant references, and uses the MCP servers to execute the workflow.

#### Prerequisites

- AWS CLI configured with credentials
- Python 3.10+ and `uv` installed
- Required IAM permissions: `cloudwatch:*`, `logs:*`, `xray:*`, `cloudtrail:*`, `application-signals:*`, `synthetics:Get*`, `s3:GetObject`, `s3:ListBucket`, `iam:Get*`

### Out of scope

- **AWS resource provisioning or modification**: This plugin is read-only for observability data; it does not create, modify, or delete AWS resources
- **Custom dashboard creation**: The plugin queries data but does not create CloudWatch Dashboards or other persistent UI artifacts
- **Automated remediation**: The plugin identifies issues and provides recommendations but does not automatically fix them
- **Non-AWS observability platforms**: Integration with Datadog, Splunk, Grafana, or other third-party monitoring tools
- **Cost Explorer integration**: While referenced in some workflows, Cost Explorer MCP server integration is not included in this initial version

### Potential challenges

- **IAM permissions breadth**: The plugin requires broad permissions across CloudWatch, X-Ray, CloudTrail, Application Signals, and S3. Users with restricted IAM policies may encounter partial functionality. Mitigation: Clear prerequisites documentation and graceful error handling guidance in reference files.
- **Reference file size**: Some reference files (security-auditing.md, performance-monitoring.md, incident-response.md) exceed the 100-line guideline from DESIGN_GUIDELINES.md due to the breadth of query patterns and workflows. Mitigation: Content is organized with clear headings for selective loading; the SKILL.md itself stays well under 300 lines.
- **MCP server availability**: All four MCP servers are published on PyPI as `uvx`-installable packages. If any server has breaking changes, the plugin may need updates. Mitigation: Using `@latest` version pins for automatic updates.
- **Region and profile configuration**: Default configuration uses `default` AWS profile and `us-east-1` region. Users must manually update `.mcp.json` env vars for different profiles/regions. Mitigation: Configuration section in SKILL.md provides clear instructions.

### Dependencies and Integrations

**Dependencies (all from AWS Labs):**
- [awslabs.cloudwatch-mcp-server](https://github.com/awslabs/mcp) - CloudWatch Logs, Metrics, Alarms
- [awslabs.cloudwatch-applicationsignals-mcp-server](https://github.com/awslabs/mcp) - Application Signals APM
- [awslabs.cloudtrail-mcp-server](https://github.com/awslabs/mcp) - CloudTrail auditing
- [awslabs.aws-documentation-mcp-server](https://github.com/awslabs/mcp) - AWS documentation access

**Integration with existing plugins:**
- Complements `deploy-on-aws` by providing post-deployment monitoring and troubleshooting capabilities
- The CloudTrail security auditing capability pairs well with infrastructure changes made via the deploy plugin

### Alternative solutions

1. **Individual MCP server setup without a plugin**: Users could manually configure each MCP server and write their own prompts. The plugin adds value through curated skill descriptions, progressive-disclosure reference files, and pre-built workflow patterns that guide the agent through complex multi-tool observability tasks.

2. **Separate plugins per capability**: Could split into aws-cloudwatch, aws-application-signals, aws-cloudtrail plugins. However, observability workflows frequently span multiple tools (e.g., incident response correlates alarms + logs + traces + CloudTrail changes), making a unified plugin more effective.

---

* RFC PR: #68
* Approved by: ''
* Reviewed by: ''

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add AWS Observability plugin #67

Is this related to an existing feature request or issue?

Summary

Use case

Proposal

Plugin structure

MCP servers

Skill design

User experience

Prerequisites

Out of scope

Potential challenges

Dependencies and Integrations

Alternative solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server	Type	Purpose
`awslabs.cloudwatch-mcp-server`	stdio	CloudWatch Logs, Metrics, Alarms, log group analysis
`awslabs.cloudwatch-applicationsignals-mcp-server`	stdio	Application Signals APM, SLOs, distributed tracing
`awslabs.cloudtrail-mcp-server`	stdio	CloudTrail security auditing, API activity tracking
`awslabs.aws-documentation-mcp-server`	stdio	Official AWS documentation search and access

RFC: Add AWS Observability plugin #67

Description

Is this related to an existing feature request or issue?

Summary

Use case

Proposal

Plugin structure

MCP servers

Skill design

User experience

Prerequisites

Out of scope

Potential challenges

Dependencies and Integrations

Alternative solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions