-
Notifications
You must be signed in to change notification settings - Fork 631
Description
Description:
Seems that EnvoyGateway validation is only performed on control plane startup, and not when the config map is updated. This can cause systems to silently get into an invalid state and prevent the control plane from restarting.
Repro steps:
We recently configured the global rate limiting service in our clusters using the following helm chart config:
config:
envoyGateway:
rateLimit:
backend:
type: Redis
redis:
url: 0.0.0.0:6378 # actual IP redactedThis worked fine for several days, but this morning one of the EG control plane pod s restarted and went into a crash loop w/ the following error:
unknown ratelimit redis url format: parse "0.0.0.0:6378": first path segment in URL cannot contain colon
Which appears to be from this validation logic.
Seems we just need to add a redis:// prefix to the URL, but we would have been notified earlier about the validation error.
I suspect it's not easy to surface this via Helm (given the async nature of how the config map is watched/processed), but I would have expected the watcher to log a validation error and prevent the rate limit service from getting setup.
I think the config watching logic is here. Shouldn't that code also call r.cfg.Validate()?
Environment:
- chart: oci://docker.io/envoyproxy/gateway-helm
- version: v1.5.0-rc.2