This guide covers monitoring, health checks, logging, and audit trails for AuthGate.
AuthGate provides a health check endpoint for monitoring service availability and database connectivity.
# Check service health
curl http://localhost:8080/health
# Response (healthy)
{
"status": "healthy",
"database": "connected",
"timestamp": "2026-02-08T10:00:00Z"
}
# Response (unhealthy - database issue)
{
"status": "unhealthy",
"database": "disconnected",
"error": "database connection failed",
"timestamp": "2026-02-08T10:00:00Z"
}- Endpoint:
GET /health - Authentication: Not required
- HTTP Status:
200 OK- Service and database are healthy503 Service Unavailable- Database connection failed
- Database Test: Performs a
PINGoperation to verify connectivity - Response Time: < 100ms typically
Docker Compose:
healthcheck:
test:
[
"CMD",
"wget",
"--no-verbose",
"--tries=1",
"--spider",
"http://localhost:8080/health",
]
interval: 30s
timeout: 3s
retries: 3
start_period: 5sKubernetes:
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3UptimeRobot / Pingdom:
- Monitor URL:
https://auth.yourdomain.com/health - Check interval: 5 minutes
- Expected status: 200
- Alert on: Status != 200 or timeout
- ✅ Health check endpoint availability (target: 99.9% uptime)
- ✅ HTTP response times (target: p95 < 200ms, p99 < 500ms)
- ✅ Error rate (target: < 0.1% of requests)
- 📊 Database file size growth (SQLite)
- 📊 Connection pool utilization (PostgreSQL)
- 📊 Query execution time (target: < 50ms average)
- 📊 Database lock contention (SQLite)
- 🔐 Active device codes count (track pending authorizations)
- 🔐 Issued tokens per hour (baseline: establish normal patterns)
- 🔐 Active sessions count (per user and total)
- 🔐 Failed login attempts (baseline: < 5% of total logins)
- 🔐 Token refresh rate (track refresh token usage)
- 🚨 Rate limit exceeded events (potential attacks)
- 🚨 Failed authentication attempts per IP (brute force detection)
- 🚨 Suspicious activity events (from audit logs)
- 🚨 Critical/Error severity audit events
- 📈 Audit events per hour (establish baseline)
- 📈 Critical severity events (alert immediately)
- 📈 Failed authentication rate (security monitoring)
- 📈 Token revocation frequency (user security awareness)
Option 1: Prometheus + Grafana
# Add Prometheus metrics endpoint (future enhancement)
# For now, parse logs and health checksOption 2: Cloud-Native (Fly.io, AWS CloudWatch)
- Use platform-provided metrics
- Monitor health check endpoint
- Set up log aggregation
Option 3: Simple Monitoring (Small Deployments)
- UptimeRobot for health checks
- Papertrail/Logtail for log aggregation
- Weekly manual audit log review
AuthGate includes a comprehensive audit logging system that tracks all critical operations and security events.
- Comprehensive Event Coverage: Authentication, device authorization, token operations, admin actions, security events
- Asynchronous Processing: Non-blocking batch writes (every 1 second or 100 records) for minimal performance impact
- Automatic Data Masking: Sensitive fields (passwords, tokens, secrets) are automatically redacted
- Flexible Filtering: Search and filter by event type, severity, actor, resource, time range, success/failure
- Web Interface: View, search, filter, and export audit logs through admin panel
- CSV Export: Export filtered logs for external analysis or compliance reporting
- Statistics Dashboard: View event counts by type, severity, and success rate
- Automatic Cleanup: Configurable retention period with automatic deletion of old logs
- Graceful Shutdown: Ensures all buffered logs are written before server stops
Configure audit logging via environment variables in .env:
# Audit Logging
ENABLE_AUDIT_LOGGING=true # Enable audit logging (default: true)
AUDIT_LOG_RETENTION=2160h # Retention period: 90 days (default)
AUDIT_LOG_BUFFER_SIZE=1000 # Async buffer size (default: 1000)
AUDIT_LOG_CLEANUP_INTERVAL=24h # Cleanup frequency (default: 24h)- ENABLE_AUDIT_LOGGING: Master switch (default:
true) - AUDIT_LOG_RETENTION: How long to keep logs (default:
90 days=2160h) - AUDIT_LOG_BUFFER_SIZE: Async buffer size (default:
1000) - AUDIT_LOG_CLEANUP_INTERVAL: Cleanup job frequency (default:
24h)
- Audit events written asynchronously (non-blocking)
- Batch writes every 1 second or 100 records
- Buffer overflow drops events with warning (rare)
- Typical overhead: < 1% CPU, < 10 MB memory for 100k events
Access audit logs through the admin panel:
Endpoints:
GET /admin/audit- View audit logs (HTML, requires admin login)GET /admin/audit/export- Export filtered logs as CSVGET /admin/audit/api- JSON API for programmatic accessGET /admin/audit/api/stats- Statistics and event counts
Web UI Features:
- Search: Full-text search across action, resource name, actor username
- Filters: Event type, severity, success/failure, actor IP, resource type, time range
- Pagination: Configurable page size (default: 20 records per page)
- CSV Export: Download filtered results for Excel/spreadsheet analysis
- Real-time Updates: New events appear after page refresh
Authentication Events:
AUTHENTICATION_SUCCESS- User successfully logged inAUTHENTICATION_FAILURE- Failed login attemptLOGOUT- User logged outOAUTH_AUTHENTICATION- OAuth provider authentication
Device Authorization Events:
DEVICE_CODE_GENERATED- Device code created for CLI/deviceDEVICE_CODE_AUTHORIZED- User authorized device in browser
Token Events:
ACCESS_TOKEN_ISSUED- Access token generatedREFRESH_TOKEN_ISSUED- Refresh token generatedTOKEN_REFRESHED- Access token refreshedTOKEN_REVOKED- Token permanently revokedTOKEN_DISABLED- Token temporarily disabledTOKEN_ENABLED- Disabled token re-enabled
Admin Operations:
CLIENT_CREATED- OAuth client createdCLIENT_UPDATED- OAuth client modifiedCLIENT_DELETED- OAuth client removedCLIENT_SECRET_REGENERATED- Client secret rotated
Security Events:
RATE_LIMIT_EXCEEDED- Request blocked by rate limiterSUSPICIOUS_ACTIVITY- Anomalous behavior detected
INFO- Normal operations (login, token issuance)WARNING- Potentially concerning (failed auth, rate limit)ERROR- Operation failures (token refresh failure)CRITICAL- Security incidents (suspicious activity)
Security & Compliance:
- Monitor Critical Events: Set up alerts for
CRITICALandERRORseverity - Regular Review: Weekly review of
AUTHENTICATION_FAILUREandRATE_LIMIT_EXCEEDED - Compliance Exports: Use CSV export for audits (SOC 2, ISO 27001, GDPR)
- Retention Policy: Adjust based on compliance (90 days typical, some require 1+ year)
Performance Optimization:
- Database Indexes: Audit logs include indexes on time, type, actor, severity
- Regular Cleanup: Enable automatic cleanup to prevent database bloat
- Monitor Buffer: Watch for "buffer full" warnings in logs
Operational:
- Backup Strategy: Include audit logs in database backups
- Cold Storage: Consider archiving old logs for long-term retention
- Access Control: Audit viewing requires admin role
View failed logins in last 24 hours:
curl -s "http://localhost:8080/admin/audit/api?event_type=AUTHENTICATION_FAILURE&since=24h" \
-H "Cookie: session=..." | jq .Export all critical events as CSV:
curl "http://localhost:8080/admin/audit/export?severity=CRITICAL" \
-H "Cookie: session=..." -o critical-events.csvGet statistics:
curl -s "http://localhost:8080/admin/audit/api/stats" \
-H "Cookie: session=..." | jq .AuthGate uses Gin's built-in logger for HTTP request logging:
[GIN] 2026/02/08 - 10:00:00 | 200 | 1.234ms | 192.168.1.1 | GET "/health"
[GIN] 2026/02/08 - 10:00:01 | 201 | 12.345ms | 192.168.1.2 | POST "/oauth/device/code"
# View all logs
sudo journalctl -u authgate -f
# View logs from last hour
sudo journalctl -u authgate --since "1 hour ago"
# View only errors
sudo journalctl -u authgate -p err
# Export to file
sudo journalctl -u authgate --since "2026-02-01" > authgate.log# Follow logs
docker logs -f authgate
# Last 100 lines
docker logs --tail 100 authgate
# Since timestamp
docker logs --since "2026-02-08T10:00:00" authgateLoki (Grafana) Example:
# promtail-config.yml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: authgate
static_configs:
- targets:
- localhost
labels:
job: authgate
__path__: /var/log/authgate/*.logPapertrail Example:
# Forward logs to Papertrail
sudo journalctl -u authgate -f | \
nc logs.papertrailapp.com <your-port>- 🚨 Health check fails for > 2 minutes
- 🚨 Error rate > 5% for > 5 minutes
- 🚨 Database connection failures
- 🚨 Critical severity audit events
- 🚨 > 100 failed login attempts from single IP in 10 minutes
⚠️ Health check intermittent failures⚠️ Database size > 80% of available space⚠️ Rate limit exceeded > 1000 times per hour⚠️ Error severity audit events⚠️ Unusual spike in authentication failures
- ℹ️ Daily summary of audit events
- ℹ️ Token issuance rate trends
- ℹ️ Active session count
- ℹ️ Database backup completion
Alert Name: AuthGate Health Check
Monitor Type: HTTP(s)
URL: https://auth.yourdomain.com/health
Interval: 5 minutes
Alert Contacts: email, slack, pagerduty
Next Steps:
- Security Guide - Production security best practices
- Troubleshooting - Debug common issues
- Configuration Guide - Configure audit logging