bug: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes

### Current Behavior

# Bug Report: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes

## Summary

APISIX Ingress Controller v2.0.0 (with ADC sidecar) creates Services with inline upstreams, but does NOT update these inline upstreams when Kubernetes Endpoints change (e.g., pod restarts, rescheduling). This causes traffic to be routed to stale/non-existent pod IPs, resulting in 504 Gateway Timeout errors.

## Environment

- **APISIX Gateway Version:** 3.11.0
- **APISIX Ingress Controller Version:** 2.0.0 (stable release)
- **Helm Chart Version:** 1.1.0 (official Bitnami chart)
- **Kubernetes Version:** OVH Managed Kubernetes
- **etcd:** 3-node cluster

## Controller Configuration

```yaml
provider:
  type: apisix
  syncPeriod: 1m
  initSyncDelay: 30s
``


## Workaround

Manual update of etcd Service entries:
```bash
kubectl exec -n apisix-system apisix-etcd-0 -- \
  etcdctl put /apisix/services/<service-id> '<updated-json-with-correct-ip>'
```

Then send HUP signal to APISIX pods to reload config:
```bash
kubectl exec -n apisix-system <apisix-pod> -- kill -HUP 1
```

## Impact

- **Severity:** Critical for production use
- **Impact:** Complete service outage for affected routes when pods restart
- **Affected:** Any route using the ADC sync pattern with inline service upstreams

## Additional Observations

1. **Hot reload not working:** Even though APISIX supports hot reload from etcd, the Service updates are never pushed to etcd in the first place.

2. **Pattern difference:** Routes created with the newer controller version use `upstream_id` references instead of inline upstreams. These DO get updated correctly. The issue affects routes/services created before this pattern change.

3. **No errors in controller logs:** The controller doesn't log any errors about failing to update services. The sync appears to complete successfully but simply doesn't update inline upstreams.

## Requested Action

1. Investigate why Service inline upstreams are not updated on endpoint changes
2. Either:
   - Fix the ADC sync to update inline upstreams in Services, OR
   - Change the sync pattern to always use `upstream_id` references instead of inline upstreams
3. Document this limitation if it's expected behavior

## Related Information

- Controller logs show sync completing with correct service count
- `syncPeriod: 1m` is being respected (syncs every minute)
- Separate Upstream objects are updated on each sync cycle
- Service objects are NOT updated after initial creation

## Contact

Happy to provide additional logs, configurations, or test scenarios to help debug this issue.

### Expected Behavior

### Expected Behavior

When a Kubernetes pod restarts and gets a new IP address, the APISIX Ingress Controller should update the upstream nodes in APISIX to reflect the new pod IP. Traffic should continue flowing to the new pod IP without interruption.

### Actual Behavior

1. Controller creates **Services with inline upstreams** containing pod IPs
2. Controller also creates **separate Upstream objects** with the same pod IPs
3. When pods restart and get new IPs:
   - The **separate Upstream objects ARE updated** with new IPs ✅
   - The **inline upstreams inside Services are NOT updated** ❌
4. Routes reference Services (via `service_id`), not the separate Upstream objects
5. Traffic continues to be routed to **stale pod IPs** that no longer exist
6. Results in **504 Gateway Timeout** errors

### Error Logs

### Evidence

**Service in etcd (NOT updated - shows stale IP):**
```json
{
  "id": "f08f5c87",
  "name": "default_beta-websocket-routes_0",
  "update_time": 1766750592,  // December 26 - 3 days old!
  "upstream": {
    "type": "roundrobin",
    "nodes": [
      {
        "host": "10.2.4.10",   // OLD IP - pod no longer exists!
        "port": 4000,
        "weight": 100
      }
    ]
  }
}
```

**Upstream object in etcd (updated correctly):**
```json
{
  "id": "f08f5c87",
  "name": "default_beta-websocket-routes_0",
  "update_time": 1767016352,  // Today - recently updated
  "nodes": [
    {
      "host": "10.2.16.3",    // CORRECT new IP
      "port": 4000,
      "weight": 100
    }
  ]
}
```

**Route configuration:**
```json
{
  "name": "default_beta-websocket-routes_beta-game-core-api",
  "service_id": "f08f5c87",   // References Service, not Upstream
  "upstream_id": null         // Not using separate upstream
}
```

**APISIX error logs:**
```
upstream timed out (110: Connection timed out) while connecting to upstream,
upstream: "http://10.2.4.10:4000/...",  // Stale IP!
```

**Kubernetes endpoint (actual pod IP):**
```
NAME                       ENDPOINTS        AGE
game-core-web   10.2.16.3:4000   25d
```

## Root Cause Analysis

The ADC (APISIX Declarative Configuration) sync mechanism appears to:

1. Watch for EndpointSlice changes in Kubernetes
2. Update the **separate Upstream objects** when endpoints change
3. **NOT update the inline upstream configuration inside Service objects**

Since Routes reference Services (which have inline upstreams), the stale IPs persist even though the separate Upstream objects have correct IPs.


### Steps to Reproduce

## Reproduction Steps

1. Deploy APISIX Ingress Controller v2.0.0 with ADC sidecar
2. Create an ApisixRoute resource pointing to a Kubernetes Service
3. Wait for controller to sync (creates Service with inline upstream in APISIX)
4. Note the pod IP in the APISIX Service's inline upstream
5. Delete the pod (e.g., `kubectl delete pod <pod-name>`)
6. Wait for new pod to start with a new IP
7. Check the APISIX Service - inline upstream still has OLD IP
8. Check the APISIX Upstream object - has correct NEW IP
9. Traffic fails with 504 timeout to the old IP

### Environment

- APISIX Ingress controller version (run `apisix-ingress-controller version --long`)
- Kubernetes cluster version (run `kubectl version`)
- OS version if running APISIX Ingress controller in a bare-metal environment (run `uname -a`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes #2689

Current Behavior

Bug Report: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes

Summary

Environment

Controller Configuration

Impact

Additional Observations

Requested Action

Related Information

Contact

Expected Behavior

Expected Behavior

Actual Behavior

Error Logs

Evidence

Root Cause Analysis

Steps to Reproduce

Reproduction Steps

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes #2689

Description

Current Behavior

Bug Report: APISIX Ingress Controller v2.0.0 - Service Inline Upstreams Not Updated on Endpoint Changes

Summary

Environment

Controller Configuration

Impact

Additional Observations

Requested Action

Related Information

Contact

Expected Behavior

Expected Behavior

Actual Behavior

Error Logs

Evidence

Root Cause Analysis

Steps to Reproduce

Reproduction Steps

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions