Skip to content

Conversation

@api-Hypernova
Copy link
Collaborator

@api-Hypernova api-Hypernova commented Nov 15, 2025

Add web health check metrics to Hypernova/webmon by installing curl, copying web_healthcheck.sh to /usr/local/bin/web_healthcheck.sh, and invoking it each loop in newrunner.sh with MONITOR_LOOP_PAUSE control

Introduce a web_healthcheck.sh script that runs curl checks on configured endpoints and pushes Prometheus metrics, add curl to the runtime image, and switch loop control to MONITOR_LOOP_PAUSE in docker/Dockerfile and xmtp_debug/newrunner.sh.

📍Where to Start

Start with the health check flow in xmtp_debug/web_healthcheck.sh, then review its invocation inside the loop in xmtp_debug/newrunner.sh.


📊 Macroscope summarized f89c4e3. 2 files reviewed, 9 issues evaluated, 5 issues filtered, 3 comments posted

🗂️ Filtered Issues

newrunner.sh — 0 comments posted, 3 evaluated, 2 filtered
  • line 3: Contract/regression: the environment variable controlling loop pauses was renamed from XDBG_LOOP_PAUSE to MONITOR_LOOP_PAUSE (line 3) and the child-process overrides were changed accordingly at lines 22, 23, 24, 29, 33, and 37. If callers or the xdbg tool expect XDBG_LOOP_PAUSE, the new script will ignore that input and xdbg may also ignore the MONITOR_LOOP_PAUSE override, changing behavior (e.g., failing to suppress internal pauses). The visible contract for controlling pause duration and the per-command override has changed without guard or compatibility. [ Low confidence ]
  • line 22: No error handling for external commands: all xdbg invocations (lines 22–25, 29, 33, 37) and the health check bash "$(dirname "$0")/web_healthcheck.sh" (line 39) are executed without checking exit status. Failures silently proceed to subsequent steps, potentially leaving the environment partially reset or tests running against a bad state. Use guards such as set -e, || exit, or && chaining to ensure a defined terminal outcome on failure. [ Low confidence ]
web_healthcheck.sh — 3 comments posted, 6 evaluated, 3 filtered
  • line 35: Health status treats only HTTP 200 as success. Healthy endpoints frequently return other 2xx codes (e.g., 204), or require following redirects (e.g., 301/302). Without -L, many endpoints will be marked unhealthy despite being reachable. Consider treating any 2xx as healthy and using curl -L when appropriate or making this configurable. [ Low confidence ]
  • line 54: PUSHGATEWAY_URL is not validated. If empty or malformed, push_url becomes invalid (e.g., starting with /metrics/...), and curl will fail or target a local file path. Add a guard to ensure PUSHGATEWAY_URL is a non-empty valid URL before pushing metrics. [ Low confidence ]
  • line 58: push_http_code extraction is fragile: grep "HTTP_CODE:" | cut -d: -f2 will return multiple lines if the response body contains HTTP_CODE: anywhere, causing ambiguous comparison and potential false negatives in success detection. Safer to read only the last line or use tail -n1 to select the appended status line. [ Low confidence ]

@api-Hypernova api-Hypernova self-assigned this Nov 15, 2025
@api-Hypernova api-Hypernova merged commit fc3cd43 into main Nov 15, 2025
3 checks passed
log "Checking ${#ENDPOINTS[@]} endpoint(s)..."

for endpoint in "${ENDPOINTS[@]}"; do
endpoint=$(echo "$endpoint" | xargs)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty endpoints aren’t skipped after trim. This can call curl with no URL and create a blank endpoint_id (e.g., .../instance/). Consider continuing the loop when endpoint is empty.

Suggested change
endpoint=$(echo "$endpoint" | xargs)
endpoint=$(echo "$endpoint" | xargs)
if [ -z "$endpoint" ]; then
continue
fi

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.


metrics_payload="# HELP ${metric_name} Health status of web endpoints (1 = healthy, 0 = unhealthy)
# TYPE ${metric_name} gauge
${metric_name}{endpoint=\"${endpoint}\",endpoint_id=\"${endpoint_id}\",http_code=\"${http_code}\"} ${health_status}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label values in metrics_payload are not escaped, so an endpoint containing \, " or newlines can produce invalid Prometheus exposition and be rejected by Pushgateway. Consider escaping backslashes, double-quotes and newlines for endpoint (and endpoint_id) before building the metric label string.

-    metrics_payload="# HELP ${metric_name} Health status of web endpoints (1 = healthy, 0 = unhealthy)
-# TYPE ${metric_name} gauge
-${metric_name}{endpoint=\"${endpoint}\",endpoint_id=\"${endpoint_id}\",http_code=\"${http_code}\"} ${health_status}
-"
+    # escape label values per Prometheus exposition format
+    escaped_endpoint=$(printf '%s' "$endpoint" | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e ':a;N;s/\n/\\n/;ta')
+    escaped_endpoint_id=$(printf '%s' "$endpoint_id" | sed -e 's/\\/\\\\/g' -e 's/"/\\"/g' -e ':a;N;s/\n/\\n/;ta')
+    metrics_payload="# HELP ${metric_name} Health status of web endpoints (1 = healthy, 0 = unhealthy)
+# TYPE ${metric_name} gauge
+${metric_name}{endpoint=\"${escaped_endpoint}\",endpoint_id=\"${escaped_endpoint_id}\",http_code=\"${http_code}\"} ${health_status}
+"

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.


: "${PUSHGATEWAY_URL:=http://localhost:9091}"
: "${HEALTHCHECK_TIMEOUT:=10}"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEALTHCHECK_TIMEOUT isn’t validated. If it’s non‑numeric or out of range, curl -m fails and health is misreported. Consider validating it as a positive integer within a sane range and defaulting when invalid.

+if ! [[ "$HEALTHCHECK_TIMEOUT" =~ ^[0-9]+$ ]] || [ "$HEALTHCHECK_TIMEOUT" -le 0 ] || [ "$HEALTHCHECK_TIMEOUT" -gt 300 ]; then
+    log "Invalid HEALTHCHECK_TIMEOUT: $HEALTHCHECK_TIMEOUT; using 10"
+    HEALTHCHECK_TIMEOUT=10
+fi

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants