-
Notifications
You must be signed in to change notification settings - Fork 366
Description
Current Behavior
Bug Report: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes
Summary
APISIX Ingress Controller v2.0.0 (with ADC sidecar) creates Services with inline upstreams, but does NOT update these inline upstreams when Kubernetes Endpoints change (e.g., pod restarts, rescheduling). This causes traffic to be routed to stale/non-existent pod IPs, resulting in 504 Gateway Timeout errors.
Environment
- APISIX Gateway Version: 3.11.0
- APISIX Ingress Controller Version: 2.0.0 (stable release)
- Helm Chart Version: 1.1.0 (official Bitnami chart)
- Kubernetes Version: OVH Managed Kubernetes
- etcd: 3-node cluster
Controller Configuration
provider:
type: apisix
syncPeriod: 1m
initSyncDelay: 30s
``
## Workaround
Manual update of etcd Service entries:
```bash
kubectl exec -n apisix-system apisix-etcd-0 -- \
etcdctl put /apisix/services/<service-id> '<updated-json-with-correct-ip>'Then send HUP signal to APISIX pods to reload config:
kubectl exec -n apisix-system <apisix-pod> -- kill -HUP 1Impact
- Severity: Critical for production use
- Impact: Complete service outage for affected routes when pods restart
- Affected: Any route using the ADC sync pattern with inline service upstreams
Additional Observations
-
Hot reload not working: Even though APISIX supports hot reload from etcd, the Service updates are never pushed to etcd in the first place.
-
Pattern difference: Routes created with the newer controller version use
upstream_idreferences instead of inline upstreams. These DO get updated correctly. The issue affects routes/services created before this pattern change. -
No errors in controller logs: The controller doesn't log any errors about failing to update services. The sync appears to complete successfully but simply doesn't update inline upstreams.
Requested Action
- Investigate why Service inline upstreams are not updated on endpoint changes
- Either:
- Fix the ADC sync to update inline upstreams in Services, OR
- Change the sync pattern to always use
upstream_idreferences instead of inline upstreams
- Document this limitation if it's expected behavior
Related Information
- Controller logs show sync completing with correct service count
syncPeriod: 1mis being respected (syncs every minute)- Separate Upstream objects are updated on each sync cycle
- Service objects are NOT updated after initial creation
Contact
Happy to provide additional logs, configurations, or test scenarios to help debug this issue.
Expected Behavior
Expected Behavior
When a Kubernetes pod restarts and gets a new IP address, the APISIX Ingress Controller should update the upstream nodes in APISIX to reflect the new pod IP. Traffic should continue flowing to the new pod IP without interruption.
Actual Behavior
- Controller creates Services with inline upstreams containing pod IPs
- Controller also creates separate Upstream objects with the same pod IPs
- When pods restart and get new IPs:
- The separate Upstream objects ARE updated with new IPs ✅
- The inline upstreams inside Services are NOT updated ❌
- Routes reference Services (via
service_id), not the separate Upstream objects - Traffic continues to be routed to stale pod IPs that no longer exist
- Results in 504 Gateway Timeout errors
Error Logs
Evidence
Service in etcd (NOT updated - shows stale IP):
{
"id": "f08f5c87",
"name": "default_beta-websocket-routes_0",
"update_time": 1766750592, // December 26 - 3 days old!
"upstream": {
"type": "roundrobin",
"nodes": [
{
"host": "10.2.4.10", // OLD IP - pod no longer exists!
"port": 4000,
"weight": 100
}
]
}
}Upstream object in etcd (updated correctly):
{
"id": "f08f5c87",
"name": "default_beta-websocket-routes_0",
"update_time": 1767016352, // Today - recently updated
"nodes": [
{
"host": "10.2.16.3", // CORRECT new IP
"port": 4000,
"weight": 100
}
]
}Route configuration:
{
"name": "default_beta-websocket-routes_beta-game-core-api",
"service_id": "f08f5c87", // References Service, not Upstream
"upstream_id": null // Not using separate upstream
}APISIX error logs:
upstream timed out (110: Connection timed out) while connecting to upstream,
upstream: "http://10.2.4.10:4000/...", // Stale IP!
Kubernetes endpoint (actual pod IP):
NAME ENDPOINTS AGE
game-core-web 10.2.16.3:4000 25d
Root Cause Analysis
The ADC (APISIX Declarative Configuration) sync mechanism appears to:
- Watch for EndpointSlice changes in Kubernetes
- Update the separate Upstream objects when endpoints change
- NOT update the inline upstream configuration inside Service objects
Since Routes reference Services (which have inline upstreams), the stale IPs persist even though the separate Upstream objects have correct IPs.
Steps to Reproduce
Reproduction Steps
- Deploy APISIX Ingress Controller v2.0.0 with ADC sidecar
- Create an ApisixRoute resource pointing to a Kubernetes Service
- Wait for controller to sync (creates Service with inline upstream in APISIX)
- Note the pod IP in the APISIX Service's inline upstream
- Delete the pod (e.g.,
kubectl delete pod <pod-name>) - Wait for new pod to start with a new IP
- Check the APISIX Service - inline upstream still has OLD IP
- Check the APISIX Upstream object - has correct NEW IP
- Traffic fails with 504 timeout to the old IP
Environment
- APISIX Ingress controller version (run
apisix-ingress-controller version --long) - Kubernetes cluster version (run
kubectl version) - OS version if running APISIX Ingress controller in a bare-metal environment (run
uname -a)