Skip to content

Conversation

@ainsleyclark
Copy link
Contributor

Overview

Integrates Uptime Kuma monitoring for WebKit-managed applications and resources. This PR implements automatic uptime monitoring for apps (HTTP), Postgres databases, and backup jobs with heartbeat tracking.

Motivation

As discussed in #267, WebKit-enabled repos now need automatic uptime alerts integrated with the existing Uptime Kuma instance at https://uptime.ainsley.dev/. This provides:

  • App uptime monitoring: Automatic HTTP health checks for all domains (primary + aliases)
  • Resource monitoring: Postgres database connection health checks
  • Backup monitoring: Heartbeat tracking for CI backup jobs (infrastructure ready)

Implementation

Phase 1: Schema & Appdef Changes

  • ✅ Added MonitoringConfig type to schema.json with simple enabled: true field
  • ✅ Created internal/appdef/monitor.go with sophisticated internal Monitor struct
  • ✅ Updated App and Resource structs with monitoring field (defaults to enabled)
  • ✅ Smart defaults: HTTP (60s), Postgres (5m), Push (26.4h for daily backups)

Phase 2: Terraform Provider Integration

  • ✅ Added ehealth-co-id/uptimekuma Terraform provider
  • ✅ Provider configuration with authentication via environment variables
  • ✅ Variables: uptime_kuma_url, uptime_kuma_username, uptime_kuma_password

Phase 3: Monitoring Module

  • ✅ Created platform/terraform/modules/monitoring/ module
  • ✅ HTTP monitors for app domains (all domains, not just primary)
  • ✅ Postgres monitors for database resources
  • ✅ Push monitors for backup heartbeats (infrastructure ready)

Phase 4: Variable Generation

  • ✅ Updated internal/infra/tf_vars.go to generate monitors from appdef
  • ✅ Transforms appdef.Monitor to tfMonitor for Terraform consumption
  • ✅ Integrated monitoring module into base Terraform configuration

Phase 6: Documentation

  • ✅ Comprehensive documentation in docs/monitoring.md
  • ✅ Configuration guide, troubleshooting, and architecture details

Configuration

User Configuration (Minimal)

Monitoring is enabled by default (opt-out). To disable:

{
  "apps": [
    {
      "name": "web",
      "monitoring": {
        "enabled": false
      }
    }
  ],
  "resources": [
    {
      "name": "db",
      "monitoring": {
        "enabled": false
      }
    }
  ]
}

Uptime Kuma Credentials

Add to .env.production.enc:

TF_VAR_uptime_kuma_username=admin
TF_VAR_uptime_kuma_password=<your-password>

Monitor Types

HTTP Monitors

  • All domains (primary + aliases) monitored
  • Health check path: configurable via health_check_path (default: /)
  • Interval: 60s
  • Expected status: 200

Postgres Monitors

  • Database connection health checks
  • Interval: 5 minutes
  • Connection URL from Terraform outputs

Push Monitors (Infrastructure Ready)

  • Heartbeat monitors for backup jobs
  • Auto-calculated interval: daily backups = 26.4 hours (with 10% buffer)
  • TODO: GitHub Actions integration (Phase 5)

What's Monitored

For this example app.json:

{
  "apps": [
    {
      "name": "web",
      "domains": [
        { "name": "example.com", "type": "primary" },
        { "name": "www.example.com", "type": "alias" }
      ]
    }
  ],
  "resources": [
    {
      "name": "db",
      "type": "postgres",
      "provider": "digitalocean"
    }
  ]
}

Monitors created:

  • project-web-example-com (HTTP)
  • project-web-www-example-com (HTTP)
  • project-db-production (Postgres)
  • project-backup-db (Push - heartbeat)

Testing

# 1. Update app.json with monitoring config
vim app.json

# 2. Run webkit update
webkit update

# 3. Preview Terraform plan
webkit infra plan production

# 4. Apply changes
webkit infra apply production

# 5. Verify monitors in Uptime Kuma
# Visit https://uptime.ainsley.dev/

TODO (Future Work)

Phase 5: Backup Heartbeat Integration

  • Export push monitor URLs from Terraform outputs
  • Store URLs as GitHub secrets
  • Update backup workflow template to ping heartbeat on success

Future Enhancements

  • Expose more configuration options (intervals, status codes, etc.)
  • Support more resource types (S3, SQLite/Turso, Redis)
  • Automated Slack notification management via Terraform
  • Status page generation

Breaking Changes

None. This is purely additive:

  • Monitoring defaults to enabled but doesn't break existing projects
  • All changes are in new files or optional fields
  • No changes to existing app.json schemas

Files Changed

Created:

  • internal/appdef/monitor.go - Monitor generation logic
  • platform/terraform/modules/monitoring/ - Monitoring Terraform module
  • docs/monitoring.md - Documentation

Modified:

  • schema.json - Added MonitoringConfig type
  • internal/appdef/apps.go - Added Monitoring field
  • internal/appdef/resources.go - Added Monitoring field
  • internal/infra/tf_vars.go - Monitor generation
  • platform/terraform/base/main.tf - Provider & module integration
  • platform/terraform/base/variables.tf - Monitor variables

Related Issues

Addresses requirements from #267 (monitoring/alerts discussion)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

ainsleyclark and others added 2 commits November 18, 2025 19:18
Integrates Uptime Kuma monitoring for apps, resources, and backup jobs.
Monitoring is enabled by default (opt-out) with simple user configuration
and sophisticated internal defaults.

Changes:
- Add MonitoringConfig to schema.json for apps and resources
- Create appdef.Monitor struct with smart defaults for HTTP, Postgres, and Push monitors
- Update App and Resource structs with monitoring field (defaults to enabled)
- Add Uptime Kuma Terraform provider configuration
- Create monitoring module at platform/terraform/modules/monitoring
- Generate monitors from apps (all domains) and resources (Postgres only)
- Add heartbeat monitor support for backup jobs

Monitoring features:
- HTTP monitors: All app domains (primary + aliases)
- Postgres monitors: Database connection health checks
- Push monitors: Heartbeat tracking for CI backup jobs
- Auto-calculated intervals with sensible defaults

Still TODO:
- Backup workflow heartbeat integration
- Tests and documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Comprehensive documentation covering:
- Configuration and setup
- Monitor types (HTTP, Postgres, Push)
- Implementation architecture
- Troubleshooting guide
- Future enhancements

Phase 5 (backup heartbeat integration) documented as TODO.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link

claude bot commented Nov 18, 2025

PR Review: Uptime Kuma Monitoring Integration

Summary

This PR adds comprehensive Uptime Kuma monitoring integration for WebKit-managed applications and resources. The implementation is well-structured and follows good design patterns. However, there are several areas that need attention before merging.

Code Quality

Strengths

  1. Clean Architecture: The separation between user-facing config (MonitoringConfig) and internal implementation (Monitor struct) is excellent
  2. Smart Defaults: The opt-out approach with sensible defaults (60s for HTTP, 5m for Postgres) is user-friendly
  3. Documentation: Comprehensive inline documentation and the monitoring.md guide are well-written
  4. Type Safety: Good use of custom types (MonitorType) with proper String() implementation

Issues

Critical: Missing Test Coverage

Location: internal/appdef/monitor.go

The new monitor.go file (216 lines) has ZERO test coverage. According to AGENTS.md guidelines, all exported functions should have comprehensive tests.

Missing tests for:

  • App.GenerateMonitors() - Line 66
  • Resource.GenerateMonitors() - Line 121
  • Resource.GenerateHeartbeatMonitor() - Line 149
  • healthCheckPath() - Line 100
  • sanitiseMonitorName() - Line 214
  • calculateHeartbeatInterval() - Line 190

Required Action: Add monitor_test.go with test tables covering:

  • Empty app/resource (should return empty slice)
  • App with multiple domains (primary + aliases)
  • App with unmanaged domains (should skip)
  • App with custom health_check_path
  • Resource with monitoring disabled
  • Non-Postgres resources (should return nil)
  • Monitor name sanitisation edge cases
  • Domain name sanitisation with special characters

High: Hardcoded Cron Schedule

Location: internal/infra/tf_vars.go:281

cronSchedule := "0 2 * * *" // Daily at 2am (from backup template).

This hardcoded value is passed to calculateHeartbeatInterval() but is ignored anyway (see next issue). The cron schedule should come from the backup configuration, not be hardcoded here.

Recommendation: Either:

  1. Extract cron schedule from Resource.Backup.Schedule if it exists
  2. Or remove the cronSchedule parameter entirely until proper cron parsing is implemented

Medium: Incomplete Implementation

Location: internal/appdef/monitor.go:190-206

The calculateHeartbeatInterval() function:

  • Accepts a cronSchedule parameter but ignores it
  • Has a TODO for proper cron parsing
  • Always returns 95040 seconds (26.4 hours)

This means ALL backup monitors will have the same interval regardless of their actual schedule, which could lead to false alerts for non-daily backups.

Recommendation:

  1. Either implement proper cron parsing now (use github.com/robfig/cron as suggested)
  2. Or add validation to prevent non-daily schedules until this is implemented
  3. Update documentation to clearly state only daily backups are supported

Medium: Empty Default for Notification IDs

Location: internal/infra/tf_vars.go:316-320

The getNotificationIDs() function always returns an empty array, meaning monitors won't send notifications until manually configured in Uptime Kuma UI. This reduces the value of automated monitor creation.

Recommendation:

  1. Add environment variable support (e.g., TF_VAR_uptime_kuma_notification_ids)
  2. Document this limitation more prominently in the PR description
  3. Add to the TODO section for Phase 6

Low: Inconsistent Error Handling

Location: internal/appdef/monitor.go:149-162

GenerateHeartbeatMonitor() returns an empty Monitor{} when backup is disabled, but callers check heartbeat.Enabled. This works but is inconsistent with GenerateMonitors() which returns nil.

Recommendation: Return nil for consistency, or add a comment explaining why empty Monitor is preferred.

Best Practices

Follow Go Guidelines (AGENTS.md)

  1. Comments should end with periods: All comments follow this correctly ✓
  2. Use British English: Correctly uses "sanitise" instead of "sanitize" ✓
  3. Type grouping: Good use of type() blocks ✓
  4. Error wrapping: N/A for this code (no errors returned)

Issues

  1. Missing test file: Violates "one test function per exported function" guideline
  2. No integration tests: Should test the full monitor generation flow with real app.json

Performance Considerations

  1. Monitor generation is efficient: O(n) iteration over domains/resources
  2. Terraform state: Each monitor is a separate resource - consider impact on large deployments with many domains
  3. API calls: Each monitor creation hits Uptime Kuma API - should document rate limits

Security Concerns

Critical: Database URL Exposure

Location: internal/appdef/monitor.go:136

DatabaseURL: r.terraformOutputReference(enviro, "connection_url"),

This creates a Terraform interpolation string that will contain the full database connection URL (including credentials) in:

  1. Terraform state files
  2. Uptime Kuma monitor configuration
  3. Potentially in logs during Terraform apply

Mitigation: The outputs.tf correctly marks postgres_monitors as sensitive=true (line 20), but ensure:

  1. Terraform state is properly secured (documented?)
  2. Uptime Kuma API credentials are stored securely
  3. Monitor credentials in Uptime Kuma are properly protected

Medium: TLS Validation Disabled Option

Location: internal/appdef/monitor.go:91, 142

IgnoreTLS is always set to false, which is correct. However, the field exists and could be accidentally enabled. Consider:

  1. Removing the field entirely if not needed
  2. Or documenting when it's acceptable to use (dev environments only)

Schema Validation

Location: schema.json

The MonitoringConfig type is correctly defined but very minimal. Consider adding:

  1. JSON schema validation example in docs/monitoring.md
  2. Version the schema if monitoring config will expand in future

Documentation

Strengths

  • Excellent docs/monitoring.md with examples
  • Good architecture explanation
  • Clear troubleshooting section

Suggestions

  1. Add diagram showing monitor generation flow
  2. Include example Terraform plan output showing created monitors
  3. Document what happens on update (are monitors updated or recreated?)
  4. Add section on costs (Uptime Kuma resource usage per monitor)

Breaking Changes

The PR claims "None" but there's a subtle issue:

Location: internal/appdef/apps.go:273 and resources.go:166

Monitoring is enabled by default in applyDefaults(). For existing projects:

  1. Running webkit update will set monitoring.enabled = true
  2. Running webkit infra apply will create monitors
  3. This requires Uptime Kuma credentials to be set

Recommendation:

  1. Add migration notes for existing projects
  2. Or make first-time monitoring opt-in with a warning
  3. Handle missing credentials gracefully (warn but don't fail)

Terraform Provider Concerns

Location: platform/terraform/base/main.tf:75-79

The uptimekuma provider configuration uses basic auth. Verify:

  1. Provider version pinning (~> 1.0) is appropriate
  2. Provider is actively maintained (check GitHub)
  3. Provider supports all monitor types used (http, postgres, push)
  4. Error messages are clear when credentials are wrong

Test Coverage Requirements

According to AGENTS.md, you MUST add:

  1. Unit tests (internal/appdef/monitor_test.go):

    • Test tables for all exported functions
    • Edge cases and error conditions
    • Following the test table pattern from the guidelines
  2. Integration tests (internal/infra/tf_vars_integration_test.go):

    • Test monitor generation in tfVarsFromDefinition
    • Verify monitor structs are correctly transformed to tfMonitor
    • Test with various app.json configurations

Example test structure needed:

func TestApp_GenerateMonitors(t *testing.T) {
    t.Parallel()
    
    tt := map[string]struct {
        app  *App
        want []Monitor
    }{
        "Monitoring disabled": {
            app:  &App{Monitoring: MonitoringConfig{Enabled: false}},
            want: nil,
        },
        "No domains": {
            app:  &App{Monitoring: MonitoringConfig{Enabled: true}, Domains: []Domain{}},
            want: []Monitor{},
        },
        // ... more cases
    }
    
    for name, test := range tt {
        t.Run(name, func(t *testing.T) {
            t.Parallel()
            got := test.app.GenerateMonitors()
            assert.Equal(t, test.want, got)
        })
    }
}

Recommendations

Before Merge (Required)

  1. Add comprehensive unit tests for monitor.go
  2. Add integration tests for monitor generation in tf_vars.go
  3. Fix hardcoded cron schedule or document limitation
  4. Add environment variable support for notification IDs
  5. Document breaking change implications for existing projects
  6. Add validation for database credential security in Terraform state

Nice to Have (Post-Merge)

  1. Implement proper cron parsing for backup intervals
  2. Add support for custom monitor intervals in app.json
  3. Support more resource types (S3, Redis)
  4. Add monitor update strategy documentation
  5. Create example app.json with monitoring configuration
  6. Add CI check to verify monitor generation doesn't break

Verdict

This is solid foundational work with good architecture, but should not be merged without test coverage. The missing tests are a critical gap that violates the project's testing guidelines.

Once tests are added and the hardcoded cron schedule is addressed, this will be ready to merge.


Great work on the comprehensive implementation. The architecture is sound and well-documented. Focus on adding those tests and this will be an excellent addition to WebKit.

ainsleyclark and others added 4 commits November 18, 2025 19:26
Improvements based on code review:
- Rename MonitoringConfig to Monitoring (simpler, more consistent)
- Add reference to Uptime Kuma API docs in Monitor struct
- Remove unnecessary IsMonitoringEnabled helper methods (direct field access)
- Move terraformOutputReference from appdef to infra layer (proper separation)
- Add fmt import to tf_vars.go

Changes adhere to proper layer separation:
- appdef: Pure domain definitions
- infra: Terraform-specific logic and references

Reference: https://github.com/louislam/uptime-kuma/wiki/API-Documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Tests added:
- internal/appdef/monitor_test.go (14 tests, 47 assertions)
  - MonitorType String conversion
  - App.GenerateMonitors() with various scenarios
  - App.healthCheckPath() edge cases
  - Resource.GenerateMonitors() for different resource types
  - Resource.GenerateHeartbeatMonitor() for backups
  - Helper functions (calculateHeartbeatInterval, sanitiseMonitorName)

- internal/infra/tf_vars_test.go (4 new test functions)
  - terraformOutputReference() Terraform interpolation
  - generateMonitors() with apps, resources, and mixed scenarios
  - tfMonitorFromAppdef() struct transformation
  - getNotificationIDs() placeholder behaviour

Test coverage:
- ✅ Monitoring enabled/disabled scenarios
- ✅ Multiple domains (primary + aliases)
- ✅ Unmanaged domains skipped
- ✅ Health check path resolution (custom vs default)
- ✅ Postgres vs S3 resource filtering
- ✅ Backup heartbeat monitor generation
- ✅ Mixed app/resource monitor generation
- ✅ Terraform output reference formatting

All tests follow WebKit conventions:
- Parallel execution where safe
- Test tables for simple cases
- t.Run subtests for complex scenarios
- Clear naming and assertions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ainsleyclark ainsleyclark marked this pull request as ready for review November 18, 2025 20:06
Remove three unused fields from the Monitor struct that provided no
functional value:

- HealthCheckPath: Redundant as the path is already embedded in the URL
- ConnectionType: Redundant as the type is already in the DatabaseURL scheme
- PushURL: Never used anywhere in the codebase

The separation between URL (for HTTP monitors) and DatabaseURL (for
database monitors) is intentionally preserved as it accurately mirrors
the Uptime Kuma API's polymorphic design where different monitor types
use different connection fields (url vs databaseConnectionString).

This reduces maintenance burden and testing surface area while keeping
the struct aligned with the actual API requirements.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants