-
Notifications
You must be signed in to change notification settings - Fork 10.3k
gRPC-gateway WebSocket /v3/* proxy allows unauthenticated memory exhaustion via oversized message amplification #21556
Description
Bug report criteria
- This bug report is not security related, security issues should be disclosed privately via security@etcd.io.
- This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- You have read the etcd bug reporting guidelines.
- Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
(duplicating my original security report below as per discussion on our email chain this will be handled as a non-security bug report)
Summary
A remote unauthenticated attacker can force etcd to allocate large amounts of memory by sending a single large websocket message to the default /v3/ websocket proxy (gRPC-gateway). This can lead to memory-exhaustion DoS and potential OOM/service outage by increasing memory use ~5x of the attacker payload.
Details
etcd wraps the gRPC-gateway handler under /v3/ with a websocket proxy (grpc-websocket-proxy). The proxy reads websocket messages with gorilla/websocket ReadMessage() without a configured read limit, which buffers whole messages in memory. As a result, an attacker-controlled websocket message size can directly drive etcd heap usage.
In the PoC below, a single unauthenticated 256 MiB websocket message to /v3/kv/range caused etcd RSS to increase by ~1.26 GiB, ~5x the message size and remain elevated at least 10 seconds after request completion.
Impact
Unauthenticated remote memory-exhaustion DoS against etcd client endpoints that are reachable by the attacker. Also especially impactful for containerized deployments with tight memory limits.
Suggested severity: High, CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H (7.5)
What did you expect to happen?
etcd should enforce request-size limits (e.g., --max-request-bytes) before fully reading/buffering the payload, thus disallowing the attacker to a) drive so much memory via the request directly and b) amplify the usage 5x.
How can we reproduce it (as minimally and precisely as possible)?
PoC
1. Adjust the attached PoC to point at the right paths for ETCD_BIN (see the top-level const definitions for other tunables too).
2. Run python ./repro-ws-huge.py
3. Expected output:
*** LAUNCH A FRESH SINGLE-NODE ETCD FOR THE WEBSOCKET MEMORY TEST ***$ /usr/local/bin/vh-etcd --name vh1 --data-dir /tmp/vh-repro-etcd-f002-data --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 --listen-peer-urls http://127.0.0.1:2380 --initial-advertise-peer-urls http://127.0.0.1:2380 --initial-cluster vh1=http://127.0.0.1:2380 --initial-cluster-state new --log-level info
*** CAPTURE BASELINE RSS BEFORE THE OVERSIZED WEBSOCKET MESSAGE ***
rss_before_kib=29980
*** SEND ONE OVERSIZED WEBSOCKET MESSAGE TO /V3/KV/RANGE ***
target=ws://127.0.0.1:2379/v3/kv/range
websocket_payload_bytes=268435456
*** CAPTURE RSS 10 SECONDS AFTER THE WEBSOCKET MESSAGE ***
rss_after_kib=1348604
rss_delta_kib=1318624Anything else we need to know?
No response
Etcd version (please run commands below)
reproduced on etcd 3.6.7, 3.6.8, and main commit c3aff56
Etcd configuration (command line flags or environment variables)
No response
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
No response