Lighthouse is a single Cloudflare Worker that provides a small, deterministic, privacy-first, aggregate-first metrics primitive with one narrow first-party JS-fired pageview ingestion path.
Architectural rule:
- Lighthouse is a standalone service and operationally independent.
- It is independently runnable and not hard-dependent on BUS Core or any other external service.
- BUS Core is a current observed client/traffic source, but Lighthouse core operation must remain independent.
- Integrations must remain optional, additive, and non-blocking.
Release authority:
- Shipped Lighthouse behavior is authorized by
SOT.md, recorded inCHANGELOG.md, and versioned bypackage.json. - No behavioral, contract, storage, configuration, auth, or scheduling change is considered released unless all three are updated together in the same change set.
- Aggregate-first: stores daily aggregate counters as the primary reporting model and retains only a narrow, short-lived raw pageview log for inspectability.
- Operationally independent: can run and serve core routes without requiring any other service to be available.
- Observed client: an external system that calls Lighthouse (for example BUS Core) without becoming a runtime dependency.
- Core operation: manifest serving, aggregate counting, first-party pageview ingestion, and protected on-demand reporting.
- Optional integration: an additive external integration that does not block core operation when unavailable.
- Shipped behavior: behavior currently implemented and documented as present reality.
- Future direction: planned or proposed behavior not yet shipped.
Lighthouse currently does six things:
- Serves the BUS Core manifest from R2.
- Increments fixed daily aggregate counters in D1.
- Accepts first-party site-emitted pageview events on
POST /metrics/pageview. - Accepts standardized multi-site events on
POST /metrics/event. - Exposes protected, on-demand aggregate reporting.
- Pulls one daily Buscore traffic snapshot from the Cloudflare GraphQL Analytics API into D1 on a scheduled cron.
It does not implement retries, identity, session tracking, unload analytics, or a broad analytics warehouse.
| Method | Path | Behavior |
|---|---|---|
| GET | /manifest/core/stable.json |
Return manifest JSON from R2 (no counting) |
| GET | /update/check |
Return manifest JSON and increment update_checks unless request IP matches IGNORED_IP |
| GET | /download/latest |
Increment downloads unless request IP matches IGNORED_IP, then 302 redirect to latest release URL from manifest |
| GET | /releases/:filename |
Serve release artifact from R2 key releases/:filename (no counting) |
| POST | /metrics/pageview |
Accept first-party JS-fired pageview JSON, always return 204, and persist/aggregate best-effort in D1 |
| POST | /metrics/event |
Accept standardized multi-site event JSON, always return 204, and persist/aggregate best-effort in D1 |
| GET | /report |
Return protected aggregate report (requires X-Admin-Token) |
Notes:
/manifest/core/stable.jsonand/releases/:filenamenever increment counters./update/checkdoes not requireX-BUS-Update-Source: corefor counting.- If
IGNORED_IPis configured and matchesCF-Connecting-IP, counting is suppressed while normal responses are still returned. POST /metrics/pageviewis unauthenticated by design, parses raw request text then JSON, and still returns204for malformed, invalid, or rate-limited submissions.- Valid accepted payloads follow the deployed BUS Core site emitter contract:
type = "pageview"; required fieldsclient_ts,path,url,referrer,utmobject,device,viewport,lang, andtz; optional omitted fieldssrc,utm.{source,medium,campaign,content},anon_user_id,session_id, andis_new_user. - Empty-string values for
referrer,lang, andtzare accepted and preserved as empty strings in raw storage. POST /metrics/pageviewand itsOPTIONSpreflight only grant browser CORS access tohttps://buscore.caandhttps://www.buscore.ca; Lighthouse does not use wildcard allow-origin on that route.- The deployed site emitter contract is accepted as-is: page-load-only, beacon-first,
fetch(..., { keepalive: true })fallback, no retries, and no session logic. POST /metrics/eventis site-aware through the tracked-site registry insrc/index.ts: each site entry definessite_key,production_hosts,allowed_origins,staging_hosts, andproduction_only_default.
GET /report returns:
{
"today": { "update_checks": 0, "downloads": 0, "errors": 0 },
"yesterday": { "update_checks": 0, "downloads": 0, "errors": 0 },
"last_7_days": { "update_checks": 0, "downloads": 0, "errors": 0 },
"month_to_date": { "update_checks": 0, "downloads": 0, "errors": 0 },
"trends": {
"downloads_change_percent": 0,
"update_checks_change_percent": 0,
"weekly_downloads_change_percent": 0,
"weekly_update_checks_change_percent": 0,
"conversion_ratio": 0
},
"traffic": {
"latest_day": {
"day": "2026-03-22",
"visits": null,
"requests": 0,
"captured_at": "2026-03-23T00:05:02.123Z"
},
"last_7_days": {
"visits": null,
"requests": 0,
"avg_daily_visits": null,
"avg_daily_requests": 0,
"days_with_data": 1
}
},
"human_traffic": {
"today": {
"pageviews": 0,
"last_received_at": null
},
"last_7_days": {
"pageviews": 0,
"days_with_data": 0,
"top_paths": [],
"top_referrers": [],
"top_sources": []
},
"observability": {
"accepted": 0,
"dropped_rate_limited": 0,
"dropped_invalid": 0,
"last_received_at": null
}
},
"identity": {
"today": {
"new_users": 0,
"returning_users": 0,
"sessions": 0
},
"last_7_days": {
"new_users": 0,
"returning_users": 0,
"sessions": 0,
"return_rate": 0
},
"top_sources_by_returning_users": []
}
}Contract note:
/reportis treated as an operator contract.- Field additions/removals or semantic changes must be deliberate and documented in SOT/changelog, not ad-hoc.
- Existing top-level fields
today,yesterday,last_7_days,month_to_date, andtrendsremain intact and semantically unchanged. - Existing top-level
trafficremains the Cloudflare-derived traffic summary and is not renamed or reinterpreted by pageview ingestion. - Additive top-level
human_trafficis JS-fired first-party pageview telemetry, not verified-human analytics. - On each authenticated
/reportrequest, Lighthouse performs one best-effort refresh capture for the previous completed UTC day before assembling the report. - This refresh reuses the same traffic capture logic as the scheduled path and does not replace cron-based capture.
- If this refresh fails,
/reportstill returns successfully with traffic fields based only on currently stored data. traffic.latest_dayis the most recent completed UTC day snapshot stored in D1 and includescaptured_at.traffic.last_7_daysaggregates stored traffic rows within the last seven UTC days and includesdays_with_data,avg_daily_visits, andavg_daily_requests.human_traffic.todayreports accepted JS-fired pageviews for the current UTC day and the latest observedreceived_atvalue for that day.human_traffic.last_7_days.top_pathsentries use{ path, pageviews }.human_traffic.last_7_days.top_referrersentries use{ referrer_domain, pageviews }.human_traffic.last_7_days.top_sourcesentries use{ source, pageviews }with precedencesrc -> utm.source -> (direct).human_traffic.observabilityis cumulative across stored pageview aggregate rows and reports accepted, dropped-rate-limited, dropped-invalid, and the latest observedreceived_at.- Additive top-level
identitysummarizes anonymous continuity using accepted pageviews only. - Additive top-level
site_eventsis populated only whensite_keyis provided on/report. /reportsupports standardized-event scope flags:site_key(required for site events),exclude_test_mode(defaulttrue), andproduction_only(default from tracked-siteproduction_only_default).- Unknown
site_keyon/reportreturns400with{"ok":false,"error":"invalid_site_key"}. identity.last_7_days.return_rateisreturning_users / distinct_usersover non-nullanon_user_idvalues in the same 7-day window.- If a traffic window has no stored data, its traffic fields return
nullinstead of synthetic zeroes. - Average daily traffic values divide by
days_with_data(rows that exist), not blindly by 7. requestscome from daily requestcounton CloudflarehttpRequestsAdaptiveGroups.visitscome fromsum.visitson the same single-query path when provided, and remain nullable when absent.
When the final Star Map production domain is known, update the star_map_generator entry in src/index.ts TRACKED_SITES:
production_hosts: canonical production host(s) used in eventurlvalues.allowed_origins: browser origin(s) allowed for CORS onPOST /metrics/event.staging_hosts: non-production hosts used for launch testing.production_only_default: keeptruefor production-clean operator reporting by default.
Operator launch-readiness report calls:
/report?site_key=star_map_generator/report?site_key=star_map_generator&exclude_test_mode=true&production_only=true
Expected Star Map event names:
- Lighthouse accepts any non-empty
event_nameand does not enforce a fixed taxonomy. - Before go-live, the Star Map owner must provide and freeze the launch event-name list so
/reporttop event-name and source/campaign readouts can be interpreted consistently.
CREATE TABLE IF NOT EXISTS metrics_daily (
day TEXT PRIMARY KEY,
update_checks INTEGER NOT NULL DEFAULT 0,
downloads INTEGER NOT NULL DEFAULT 0,
errors INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS buscore_traffic_daily (
day TEXT PRIMARY KEY,
visits INTEGER NULL,
requests INTEGER NOT NULL,
captured_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS pageview_events_raw (
id TEXT PRIMARY KEY,
received_at TEXT NOT NULL,
received_day TEXT NOT NULL,
client_ts TEXT NULL,
path TEXT NULL,
url TEXT NULL,
referrer TEXT NULL,
referrer_domain TEXT NULL,
src TEXT NULL,
utm_source TEXT NULL,
utm_medium TEXT NULL,
utm_campaign TEXT NULL,
utm_content TEXT NULL,
device TEXT NULL,
viewport TEXT NULL,
lang TEXT NULL,
tz TEXT NULL,
anon_user_id TEXT NULL,
session_id TEXT NULL,
is_new_user INTEGER NOT NULL DEFAULT 0,
country TEXT NULL,
js_fired INTEGER NOT NULL DEFAULT 1,
ip_hash TEXT NULL,
user_agent_hash TEXT NULL,
accepted INTEGER NOT NULL DEFAULT 1,
drop_reason TEXT NULL,
request_id TEXT NULL,
ingest_version TEXT NULL
);
CREATE TABLE IF NOT EXISTS pageview_daily (
day TEXT PRIMARY KEY,
pageviews INTEGER NOT NULL DEFAULT 0,
accepted INTEGER NOT NULL DEFAULT 0,
dropped_rate_limited INTEGER NOT NULL DEFAULT 0,
dropped_invalid INTEGER NOT NULL DEFAULT 0,
last_received_at TEXT NULL
);
CREATE TABLE IF NOT EXISTS pageview_daily_dim (
day TEXT NOT NULL,
dim_type TEXT NOT NULL,
dim_value TEXT NOT NULL,
count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY(day, dim_type, dim_value)
);
CREATE TABLE IF NOT EXISTS pageview_rate_limit (
minute_bucket TEXT NOT NULL,
ip_hash TEXT NOT NULL,
count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY(minute_bucket, ip_hash)
);Pageview ingestion notes:
pageview_events_rawis retained for about 30 UTC days for inspectability and validation.- IP and user-agent values are stored as SHA-256 hashes when present; Lighthouse does not store raw IPs.
- Anonymous continuity fields (
anon_user_id,session_id,is_new_user) are accepted from first-party payloads only and used for aggregate retention reporting. pageview_daily_dimonly tracks accepted dimensions forpath,referrer_domain,src, andutm_source.pageview_rate_limitenforces approximate per-IP minute buckets and stale buckets are pruned during the existing daily scheduled run.
Required bindings/secrets:
DBMANIFEST_R2ADMIN_TOKENIGNORED_IP(optional)CF_API_TOKEN(required for scheduled Buscore traffic capture)CF_ZONE_TAG(required for scheduled Buscore traffic capture)
No new bindings or secrets are introduced by pageview ingestion.
Lighthouse is on-demand only.
- Daily cron trigger captures one previous completed UTC day Buscore traffic snapshot from the Cloudflare GraphQL Analytics API.
- The same scheduled execution also prunes raw pageview events older than about 30 UTC days and stale rate-limit buckets older than about 2 days.
- No outbound Discord posting.
Traffic capture notes:
- The cron always queries the previous completed UTC day. It never queries the current UTC day and never stores rolling-window snapshots.
- Each scheduled run executes one GraphQL query only.
- Successful captures upsert one final row per UTC day, so reruns converge to one row for that day.
- If the Cloudflare pull fails or returns GraphQL errors, Lighthouse skips the row for that day rather than writing synthetic zeroes.
- If the query returns no daily row for the selected day/hostname, Lighthouse treats the run as failed and skips the row.
- Lighthouse validates that the response includes a numeric daily request
countfield; if missing/undefined/non-numeric, the run is treated as failed and the row is skipped. - Authenticated
/reportalso performs one best-effort refresh capture for the previous completed UTC day before report assembly, using the same per-day capture logic.
- Node.js >= 18
- Wrangler CLI (installed as dev dependency)
- A Cloudflare account
npm installnpx wrangler d1 create buscore-lighthouseCopy the database_id from the output and replace YOUR_D1_DATABASE_ID in wrangler.toml.
# local (for wrangler dev)
npx wrangler d1 migrations apply buscore-lighthouse --local
# remote (production)
npx wrangler d1 migrations apply buscore-lighthouse --remotenpx wrangler secret put ADMIN_TOKEN
npx wrangler secret put CF_API_TOKENAdd CF_ZONE_TAG to your Worker environment configuration before deploying scheduled traffic capture.
Ensure existing bindings are configured for your environment (DB and MANIFEST_R2).
Also configure CF_ZONE_TAG and ensure the scheduled traffic pull is authorized with CF_API_TOKEN.
npm run deploynpm run devnpm run typecheck