-
Notifications
You must be signed in to change notification settings - Fork 9k
Description
What feature is it?
Hi RSSHub team,
There are multiple RSSHub instances (same codebase, but different runtime configs). Some of them are configured with anti-bot workarounds (e.g. proxy/cookies/puppeteer/etc.), so for certain strict websites only a subset of instances can successfully generate feeds for a given route.
I’d like to put an external load balancer / router in front of these instances and route traffic to an instance that actually has usable content for a specific route.
Problem
Currently, the only practical way to check whether an instance can serve a specific route is to request the route itself (e.g. GET /some/route). However, requesting the route may trigger the route handler’s fetching/refreshing logic (especially when cache is missing/expired), which means:
- Probing multiple instances causes multiple upstream crawling attempts
- This increases load and frequently triggers anti-bot protection (403 / captcha / bans)
- In my case, I don’t want “health checks” to start crawling
- I only want to find an instance that already has content (or at least has a known “last success” status) and skip the rest
In other words: I need a way to do a route-level health check without executing the crawler/fetching logic.
What I’m looking for
A lightweight API (or an option/header/query flag) to check the status of a specific route without triggering any upstream fetch. For example:
- Does this instance currently have a cached result for
/some/route? - When was it last successfully updated?
- Was the last execution an error (and what error)?
- Ideally: a “cache-only” check that never refreshes/fetches
This would allow a load balancer to:
- pick an instance that already has cached content,
- skip instances that don’t have content,
- and avoid unnecessary scraping attempts.
Possible API / behavior ideas
Any of the following would work (just examples):
- A dedicated status endpoint
GET /api/route/status?path=/some/route
returns JSON like:
{
"cached": true,
"lastSuccess": "2025-12-27T00:00:00Z",
"lastUpdated": "2025-12-27T00:00:00Z",
"lastError": null
}and does not trigger any fetch.
- A “cache-only” mode for existing routes
GET /some/route?cacheOnly=1(or a header similar toCache-Control: only-if-cached)
- returns
200with cached feed if present - returns
404/204if not cached - and never tries to refresh the feed
What problem does this feature solve?
NOP
Additional description
No response
This is not a duplicated feature request or new RSS proposal
- I have searched existing issues to ensure this feature has not already been requested and this is not a new RSS proposal.