Skip to content

Per-route health check / cache-only probe API (avoid triggering crawling when probing) #20768

@chesha1

Description

@chesha1

What feature is it?

Hi RSSHub team,

There are multiple RSSHub instances (same codebase, but different runtime configs). Some of them are configured with anti-bot workarounds (e.g. proxy/cookies/puppeteer/etc.), so for certain strict websites only a subset of instances can successfully generate feeds for a given route.

I’d like to put an external load balancer / router in front of these instances and route traffic to an instance that actually has usable content for a specific route.

Problem

Currently, the only practical way to check whether an instance can serve a specific route is to request the route itself (e.g. GET /some/route). However, requesting the route may trigger the route handler’s fetching/refreshing logic (especially when cache is missing/expired), which means:

  • Probing multiple instances causes multiple upstream crawling attempts
  • This increases load and frequently triggers anti-bot protection (403 / captcha / bans)
  • In my case, I don’t want “health checks” to start crawling
  • I only want to find an instance that already has content (or at least has a known “last success” status) and skip the rest

In other words: I need a way to do a route-level health check without executing the crawler/fetching logic.

What I’m looking for

A lightweight API (or an option/header/query flag) to check the status of a specific route without triggering any upstream fetch. For example:

  • Does this instance currently have a cached result for /some/route?
  • When was it last successfully updated?
  • Was the last execution an error (and what error)?
  • Ideally: a “cache-only” check that never refreshes/fetches

This would allow a load balancer to:

  • pick an instance that already has cached content,
  • skip instances that don’t have content,
  • and avoid unnecessary scraping attempts.

Possible API / behavior ideas

Any of the following would work (just examples):

  1. A dedicated status endpoint
    GET /api/route/status?path=/some/route
    returns JSON like:
{
  "cached": true,
  "lastSuccess": "2025-12-27T00:00:00Z",
  "lastUpdated": "2025-12-27T00:00:00Z",
  "lastError": null
}

and does not trigger any fetch.

  1. A “cache-only” mode for existing routes
    GET /some/route?cacheOnly=1 (or a header similar to Cache-Control: only-if-cached)
  • returns 200 with cached feed if present
  • returns 404/204 if not cached
  • and never tries to refresh the feed

What problem does this feature solve?

NOP

Additional description

No response

This is not a duplicated feature request or new RSS proposal

Metadata

Metadata

Assignees

No one assigned

    Labels

    RSS enhancementNew feature or request to existing RSS

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions