Persist and expose multi-battle station state by Mygod · Pull Request #353 · UnownHash/Golbat

Mygod · 2026-03-31T23:54:04Z

Why

Max Battles can temporarily override a station's normal local battle. Golbat previously stored only one flat battle_* payload on station, so a temporary event battle would overwrite the local one and the original state would be lost for planning and downstream filtering.

This PR adds first-class multi-battle support for stations while keeping the existing flat station/webhook fields for compatibility.

Ground rule

Latest observed game data is treated as ground truth.

For a given station:

newly observed battles are persisted independently
if a new battle has a later or equal battle_end than another known battle on the same station, the older/equal-ending row is treated as obsolete and removed
a shorter battle does not remove a longer known battle

This keeps temporary overrides and later event rotations aligned with the most recent game state instead of waiting for old rows to expire naturally.

What this PR adds

a new station_battle table via migration 54_station_battle.up.sql
station battle persistence separate from the parent station row
a compatibility projection that still fills station.battle_*
a battles array in station API and max_battle webhook payloads
multi-battle-aware station filtering instead of single-battle-only filtering
hydration of station_battle rows in both preload and lazy station reads
expired station_battle cleanup support
canonical-battle-based metric dedup

Canonical battle behavior

A station can now know about multiple non-expired battles, but one battle is still chosen as the canonical representative for:

flat station.battle_*
top-level max_battle webhook fields
battle metric dedup

Selection rule:

prefer an active battle over an upcoming one
if none are active, use the best upcoming battle
later battle_end wins among surviving rows

The full battles array remains available for planning use cases.

Identity / persistence

bread_battle_seed is used as the persisted battle identity.

Live validation before implementation showed:

no observed seed reuse across multiple stations
seeds changed when battles rotated

That made seed the best available battle-instance identifier for persistence.

API / webhook impact

Backward compatibility is preserved:

existing top-level battle_* station fields remain
existing top-level max_battle webhook fields remain

New additive fields:

station API: battles
max_battle webhook: battles

This allows downstream consumers to keep working unchanged while giving planning-oriented consumers the full set.

Deployment notes

migration 54_station_battle.up.sql is included and runs automatically on startup
no manual backfill from legacy station.battle_* into station_battle is included
population of station_battle depends on normal rescans after deploy
optional cleanup support is available via cleanup.station_battles

Practical rollout expectation:

after deploy, rescans repopulate the new table quickly
during the current non-overlap period, most stations will still have exactly one battle row

Validation

Tested with:

go test ./decoder -run 'StationBattle|StationWebhooks|BuildStationResult|FortLookup|SyncStation|Preload'
go test ./... -run '^$'

Post-deploy live DB checks on one environment showed:

migration applied cleanly at version 54
no active legacy-only stations
no active projection mismatches between station and station_battle
no active station_battle rows missing boss data
currently one active battle row per station, which matches the present daily-rotation state

Notes

This PR is intended to preserve battles for planning, especially around Monday / Max Battle event overlaps, while keeping existing API and webhook consumers stable.

Fixes #352.

jfberry · 2026-04-01T12:17:34Z

I'll leave others to comment on whether this is desirable functionality. I don't really care much one way or the other. I do have a few comments on the way its implemented as this doesn't follow commonly held patterns within the code base, or even guidelines in claude.md

An automated review highlights some quite significant variances and performance observations - I haven't bothered with a human review until others chime in about whether the capability is one we want to have or not.

Database Write Impact — Significant Change

Before: Station updates go through the write-behind queue (batched, coalesced, deferred).

After: Every station GMO update now also does a synchronous transactional write outside the write-behind queue:

BEGIN
INSERT INTO station_battle ... ON DUPLICATE KEY UPDATE (upsert)
DELETE FROM station_battle WHERE station_id=? AND seed<>? AND battle_end<=? (prune)
COMMIT

That's 4 DB round-trips per station per GMO, executed synchronously while holding the station lock. This blocks other goroutines waiting
on the same station for the duration of the DB transaction. During daily rotation windows when many stations update simultaneously, this
could become a bottleneck.

The write goes to GeneralDb (shared connection pool), so it competes with all other non-pokemon writes.

Performance Concerns

Synchronous DB in hot path: syncStationBattlesFromProto is called from UpdateStationBatch (the GMO decode loop) while the station lock
is held. Every other entity in this codebase defers DB writes through the write-behind queue — station battles are the exception.
Repeated getKnownStationBattles calls: A single station save triggers this function ~4-5 times (snapshot, sync, rtree update, webhook,
API build). Each call does a cache load + clone + sort. Not expensive per-call but multiplied across all stations per GMO.
Side-effecting reads: getKnownStationBattles prunes expired battles from the cache as a side effect of reading. Other code paths in
this codebase don't mutate cache state on read.
String-based change detection: stationBattleSignatureFromSlice builds a signature string via fmt.Sprintf per battle. This is computed
multiple times per update cycle. Minor, but a hash or struct comparison would be cheaper.

Pattern Adherence

Follows patterns well:

xsync.MapOf for concurrent cache (matches fortLookupCache)
Phased preload with correct ordering (battles after stations, parallel with incidents)
Cleanup uses the same ticker pattern as incidents/pokemon/tappables
Backward-compatible API additions (flat fields remain, new battles array is additive)
Deadlock retry with exponential backoff (matches write-behind queue pattern)
Test mocking via function variable (upsertStationBattleRecordFunc)

Deviates from patterns:

No write-behind queue — every other persisted entity uses TypedQueue[K, T] for batched, coalesced, rate-limited DB writes. Station
battles do direct synchronous transactional writes. This is the most significant deviation.
No TTL-based cache — uses plain xsync.MapOf instead of ttlcache.Cache or ShardedCache. Expiry is handled by read-time pruning + hourly
cleanup. If a station is never re-read, its expired battles stay in memory until the cleanup routine runs.
Ad-hoc control flow flags — forceSave and skipWebhook on Station are new patterns. Other entities use dirty tracking and old/new value
comparison exclusively. skipWebhook in particular is a hidden side-channel that short-circuits the webhook path.
Transaction-based writes — storeStationBattleRecord is the only entity using explicit BEGIN/COMMIT. Other entities use single INSERT
... ON DUPLICATE KEY UPDATE statements.
hydrateStationBattlesForStation is called without station lock in GetStationRecordReadOnly (before GetOrSetFunc). If another goroutine
is processing a GMO for the same station concurrently, both could race on stationBattleCache.Store. xsync.MapOf won't crash, but one
write could overwrite the other.

jfberry · 2026-04-01T12:17:39Z

Looking at the CLAUDE.md instructions, let me check which PR patterns violate the documented rules.

Code Review — Pattern Compliance with Project Architecture

The project's CLAUDE.md documents specific architectural patterns that all entities are expected to follow. This PR
introduces StationBattleData as a new persisted entity but deviates from several of these documented patterns.

1. Write-behind queue for persistence

Each entity type has a TypedQueue[K, T] that batches and coalesces writes
— CLAUDE.md, "Write-Behind Queues" section

Every other persisted entity (pokemon, gym, pokestop, station, spawnpoint) uses the write-behind queue for DB persistence.
station_battle bypasses this entirely with direct synchronous transactional writes (BEGIN / INSERT ... ON DUPLICATE KEY UPDATE /
DELETE / COMMIT) executed in the GMO decode hot path while holding the station lock.

This has practical consequences: 4 DB round-trips per station per GMO, no coalescing of repeated updates, and the station lock is held
for the full duration of the DB transaction.

2. Entity struct pattern

type Pokestop struct {
    mu TrackedMutex[string] `db:"-"`  // Entity-level mutex
    PokestopData                       // Embedded — copied for queue snapshots
    dirty     bool     `db:"-"`        // Needs DB write
    newRecord bool     `db:"-"`        // INSERT vs UPDATE
    oldValues PokestopOldValues `db:"-"` // Snapshot for webhook comparison
}

— CLAUDE.md, "Struct Pattern" section

StationBattleData is a plain struct with none of these: no entity-level mutex, no embedded data struct for queue snapshots, no dirty
flag, no newRecord flag, no oldValues snapshot. This is the foundational pattern for all entities in the codebase.

3. Record access pattern

Four access patterns, from lightest to heaviest:

Peek*Record — Cache-only lookup

get*RecordReadOnly — Cache lookup with DB fallback

get*RecordForUpdate — ReadOnly + snapshot

getOrCreate*Record — Atomic create-if-absent

The caller MUST call the returned unlock function.
— CLAUDE.md, "Record Access Patterns" section

Station battles have no record access functions. The stationBattleCache is read and written directly (via
xsync.MapOf.Load/Store/Delete) without acquiring an entity lock. Notably, hydrateStationBattlesForStation in
GetStationRecordReadOnly mutates stationBattleCache before the station lock is acquired, which could race with
syncStationBattlesFromProto on another goroutine.

4. Setter methods with dirty tracking

Setter methods (SetName, SetLat, etc.) track dirty state and optionally log field changes when dbDebugEnabled is true.
— CLAUDE.md, "Struct Pattern" section

StationBattleData fields are assigned directly with no setters and no dirty tracking. To work around this, the PR adds a forceSave
flag on Station that bypasses the dirty check in saveStationRecord. A skipWebhook flag is also added as a side-channel to suppress
webhooks on DB failure. Neither pattern exists on any other entity.

# Conflicts: # AGENTS.md # AGENTS.md~origin_main

Mygod · 2026-04-06T05:43:22Z

ReactMap PR is done. From my testing, everything seems ok. Feel free to complain now. :)

I'm also planning to do another review pass in two days.

jfberry · 2026-04-08T10:01:10Z

I haven't done a manual review of it yet, but here is the claude first pass review - which picks up some issues (mainly around the database write not being batched like all the other writers - without looking it's likely to be a msunderstanding of the way sqlx can construct a statement from an array).

PR #353 Review — Persist multi-battle station state

The PR has clearly been revised since my earlier comments — forceSave/skipWebhook, the synchronous-in-hot-path writes, and the unprotected
xsync.Store race in GetStationRecordReadOnly are all gone. It's much closer to project conventions now. A stationBattleQueue TypedQueue[string,
StationBattleWrite] exists; hydration runs under the station lock; the rtree update is fed from the same snapshot used by save/webhook.

Record access semantics

Hydration in both branches of GetStationRecordReadOnly now happens after station.Lock(caller), so the previous race on
stationBattleCache.Store is gone.
getOrCreateStationRecord only hydrates when loading a fresh record from DB. A station that was created via proto-only path (newRecord=false
after the initial save) never has its battles hydrated until somebody calls GetStationRecordReadOnly, but hasHydratedStationBattles is set the
first time a proto carries battles via syncStationBattlesFromProto, so this is fine in practice.
Cache-hit hydration error in GetStationRecordReadOnly is only Debug-logged and still returns the station (station_state.go ~L83). The DB-load
branch returns the error. Inconsistent — and a transient SELECT failure leaves the station permanently un-hydrated for that session because
nothing flips hasHydratedStationBattles. Recommend marking hydrated even on empty/failure or returning the error. Minor.

Deadlocks / lock ordering

No new cross-entity lock acquisitions. All battle state is keyed by stationId and accessed under the station mutex (or via lock-free
xsync.MapOf for the read paths). Consistent with CLAUDE.md "Lock Ordering" rules.
hydrateStationBattlesForStation does a synchronous DB SELECT while holding the station lock. That's the same pattern loadStationFromDatabase
uses, so it's not a regression, but it does mean a slow station_battle query stalls every other goroutine waiting on that station. Fine in
steady state because hydration only runs once per station per process lifetime.

Write-behind correctness

stationBattleQueue is a real TypedQueue keyed by StationId, so repeated updates within a flush window coalesce — good.
However, flushStationBattleBatch just loops the snapshots and calls storeStationBattleSnapshot one-at-a-time. Each call does BEGIN →
bulk-INSERT … ON DUPLICATE KEY UPDATE → DELETE … WHERE seed NOT IN (…) → COMMIT. So a flush of 50 stations = 50 transactions × 4 round-trips =
~200 round-trips against GeneralDb. The BatchSize knob doesn't actually batch across stations the way other entities' flushes do. Not broken,
but the queue is providing coalescing only, not true batching.
The DELETE … NOT IN (…) strategy is correct for pruning stale seeds without losing newly-inserted rows in the same tx, and the per-snapshot
tx scope means a flush failure for one station doesn't poison others. Good.
Coalescing risk: when a snapshot is enqueued, the Battles slice is cloned (stationBattleWriteFromSlice → cloneStationBattles), so the queue
holds a stable snapshot even if the in-memory cache mutates afterwards. Good.

Database write volume

For active stations the write trigger is battleListChanged (signature mismatch). The signature is computed from the non-expired battle list,
so the signature changes whenever:
a. a new battle arrives (expected),
b. a known battle expires across now boundary (NOT a real change in DB-relevant data, but still triggers an enqueue).
Effect: every station with an active battle will write at least once when that battle ends, even if no proto for it arrives. Bounded — one
extra write per battle expiry. Acceptable.
The 15-min debounce is still bypassed by battleListChanged, so high-churn stations can write more often than before. Should be fine given the
queue coalesces, but worth knowing.

Memory

Per-station: a []StationBattleData (typically 1–3 entries, ~200 B each) in stationBattleCache + an empty struct entry in
stationBattleHydratedCache. For ~100k stations, well under 100 MB worst-case. No leak — clearStationBattleCaches is wired into the station
cache eviction callback. Good.
The eviction callback is now installed unconditionally (not gated on FortInMemory), which is necessary for battle-cache cleanup and
incidentally fixes a small consistency issue.

CLAUDE.md / pattern adherence

✅ Write-behind queue used.
✅ Cache eviction callback drops auxiliary state.
✅ Hydration done under entity lock.
✅ Webhook payload built from a single snapshot, not by re-reading mutable state.
⚠️ StationBattleData still doesn't follow the standard "embedded Data + dirty + oldValues" entity pattern — but it's a child collection of an
existing entity, not a standalone entity, so the deviation is justified. The BattleListSignature in StationOldValues is the dirty-tracking
equivalent.
⚠️ stationBattleSignatureFromSlice is awkward — fields are written in a strange order with Valid flags interleaved between value writes
(station_battle.go ~L865). It works, but a bytes.Buffer/fnv hash or struct equality would be cheaper and less error-prone. Nit.
⚠️ Webhook regression: createStationWebhooksWithSnapshot returns early if currentSignature == "". That suppresses a webhook in the "last
battle just ended" transition. Previously a station whose BattleEnd rolled past now would still get a MaxBattle webhook on the EDIT. May or may
not be intentional — worth confirming with the author.

Bugs / smells

Cache-hit hydration error swallowed (above) — minor.
fortRtreeUpdateStationOnGet no longer called inside GetOrSetFunc — moved outside. If two goroutines race, only the winner's loadedFromDb is
true; the loser sees loadedFromDb=false and won't update the rtree. The winner does update it. Correct, though subtle.
StartStationBattleExpiry cleanup interval is 1h13m (stats.go). The other expiry tickers use similarly arbitrary primes — consistent with
existing convention.
Signature-driven save bypasses 15-min debounce — see above. Worth a one-line comment in the code.
stationBattleSignatureFromSlice ordering — see above. Nit, not a bug.

Recommendation

Confirms the "battle ended → no battles left" webhook suppression is intentional (or restores it).
Either logs the cache-hit hydration error at Warn/Error or treats it like the DB-load path.

Mygod · 2026-04-08T15:59:50Z

I identified some overengineering in this branch. Once I fix that I will take a closer look at these.

Mygod · 2026-04-08T20:00:32Z

Codex:

Addressed the batching concern on the current head.

station_battle now flushes like the other typed write-behind queues instead of only coalescing by station_id: flushStationBattleBatch does one bulk upsert across the flushed rows and one bulk prune across the flushed station ids in a single transaction. I intentionally kept the failure semantics aligned with the existing TypedQueue behavior used elsewhere in Golbat, so batch-level failure isolation is the same as the other batched writers rather than preserving a station_battle-specific special case.

The earlier notes about the old hydrated-cache/canonical helper paths are stale after the recent cleanup and no longer apply to the current code.

I did not take the separate seed-collision/schema concern in this branch.

I believe the battle seed is unique (it appears to be a random int64) so defensive primary key by station id is most likely unneeded.

jfberry · 2026-04-08T20:52:45Z

Claude is generally a fan (with some minor comments), I'll probably have some time to look myself in a few days time.

⏺ PR #353 — Re-review of the latest revision (f8e40cd)

This revision is a substantial cleanup. Most of my prior concerns are resolved.

What got better

Real batched flush. flushStationBattleBatch is now a single transaction per flush: one bulk INSERT … ON DUPLICATE KEY UPDATE across every
station in the batch, one DELETE … LEFT JOIN keep_rows to prune obsolete seeds, one commit. The previous "loop and call
storeStationBattleSnapshot per station" pattern is gone, and with it the concern that the queue was providing only coalescing and no batching.
DB round-trip volume now scales with BatchSize like the other entities.
Single cache instead of two. stationBattleCache is now xsync.MapOf[string, stationBattleState] carrying {Battles, Loaded}. The parallel
stationBattleHydratedCache is gone, eliminating the small window where the two maps could disagree. storeStationBattles is the one entry point.
No more "canonical vs projection" duality. stationBattleFromStationProjection, canonicalStationBattleFromSlice,
stationBattleMatchesProjection, canonicalBattleSeed, restoreStationBattleProjectionFromOldValues — all deleted. The "top" battle is now just
snapshot.Battles[0] after sortStationBattlesByEnd. StationOldValues shrinks to {EndTime, HasTopBattle, TopBattleSeed, BattleListSignature}.
Much easier to reason about.
Single source of truth for battle fields on Station. updateFromStationProto no longer pokes battle fields directly — they flow exclusively
through syncStationBattlesFromProto → applyStationBattleProjection. The previous risk of the proto path and the cache path disagreeing is gone.
Tests consolidated into a table-driven TestUpsertCachedStationBattleOrdering covering the merge edge cases (drops earlier end, replaces
equal end, drops conflicting future battle, etc.).

Remaining items

Cache-hit hydration error is still Debug-logged, not returned (station_state.go ~L81). I noted this last round. The author appears to have
intentionally kept it: TestGetStationRecordReadOnlyRetriesHydrationOnCachedStation exists specifically to assert that a transient hydration
failure is retried on the next read. So the behavior is deliberate — a transient SELECT failure doesn't poison the station, it just retries.
I'd bump it to Warn so failures are visible in logs, but this is a nit not a blocker.
Webhook suppression on "all battles just expired." createStationWebhooksWithSnapshot still returns early if currentSignature == "". Net
effect: when the last active battle expires, no MaxBattle webhook fires for that transition. Likely intentional (the next station-update GMO
will produce one once is_battle_available toggles), but I'd ask the author to confirm.
Signature-driven save still bypasses the 15-min debounce. Same as last round, but now harmless because the queue coalesces writes, so a
station whose battles tick across an expiry boundary at most produces one extra coalesced enqueue, not extra DB traffic.
mergeStationBattles is subtle. The "observed is current but not yet active, keep it iff no active battle covers it" branch reads correctly
against the test cases but is the trickiest function in the file. It's well-tested. Just flagging that future maintenance should be wary of
edits here.
getKnownStationBattles still does a write-on-read when all battles are expired (collapses to {Loaded:true, Battles:nil}). Bounded and tested
(TestGetKnownStationBattlesDoesNotMutateCacheOnRead shows it leaves the cache alone when at least one battle is current). Fine.

Lock semantics, deadlocks, memory

All hydration is still under the station mutex, no cross-entity locks introduced.
Per-station memory cost unchanged (small slice of battles, typically 1–3 entries).
Cache eviction callback still wired correctly via clearStationBattleState.

Verdict

A small ask that the author confirm the "all battles expired" webhook suppression is intentional and consider bumping
the cache-hit hydration error to Warn. The architectural deviations from CLAUDE.md that I flagged previously are gone or are now defensible:
the entity uses the write-behind queue, batching is real, hydration is lock-protected, and the projection/canonical machinery has collapsed
into "first element of a sorted slice." The code is meaningfully simpler than the prior revision while doing more.

Mygod · 2026-04-08T21:22:28Z

I reworked the station_battle path again and the current head should address the earlier architectural concerns:

station_battle now flushes as a real write-behind batch again, not just coalescing by station_id

battle cache state is now a single {Battles, Loaded} map instead of separate cache + hydrated maps

the old canonical/projection split is gone; the top battle is just snapshot.Battles[0]

station battle projection is now owned only by syncStationBattlesFromProto

On the two remaining behavioral points:

The currentSignature == "" early return in createStationWebhooksWithSnapshot is intentional for now. That means we do not emit a max_battle webhook on the “last known battle expired” transition.

Cache-hit hydration failure in GetStationRecordReadOnly is also intentional for now: on a transient station_battle read failure we still serve the cached station and retry hydration on the next read rather than poisoning the station state. I can bump that log from Debug to Warn later if we want more visibility, but I left the behavior unchanged on purpose.

Mygod added 4 commits March 31, 2026 19:57

Multi-battle support

e4480ec

Prioritize new data over cached results

1d11eb0

Order preloaded station battles deterministically

7c5b030

Clean up

69c05ad

Mygod force-pushed the multibattle branch from 779d12c to 69c05ad Compare March 31, 2026 23:58

Mygod marked this pull request as draft April 1, 2026 00:00

Mygod added 2 commits March 31, 2026 20:04

Handle deadlocks

e057654

Less noisy please

f0661b6

Mygod added 5 commits April 1, 2026 10:12

Add AGENTS.md as a symlink

a3072da

Address Claude review comments

c46059b

Fixes some issues

cf516cf

Merge remote-tracking branch 'origin/main' into multibattle

d785311

# Conflicts: # AGENTS.md # AGENTS.md~origin_main

Pick currently active battle for canonical battle in the station row

ec613e4

Mygod mentioned this pull request Apr 6, 2026

feat: multi-battle for stations WatWowMap/ReactMap#1205

Open

Mygod marked this pull request as ready for review April 6, 2026 05:42

Mygod added 2 commits April 7, 2026 22:08

Handle empty battle scans

7ac198a

Trim unused code

52e053a

Mygod added 2 commits April 8, 2026 15:15

Kill overengineering

d66d12d

Fix batching

f8e40cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist and expose multi-battle station state#353

Persist and expose multi-battle station state#353
Mygod wants to merge 15 commits intoUnownHash:mainfrom
Mygod:multibattle

Mygod commented Mar 31, 2026 •

edited

Loading

Uh oh!

jfberry commented Apr 1, 2026

Uh oh!

jfberry commented Apr 1, 2026 •

edited

Loading

Uh oh!

Mygod commented Apr 6, 2026

Uh oh!

jfberry commented Apr 8, 2026

Uh oh!

Mygod commented Apr 8, 2026

Uh oh!

Mygod commented Apr 8, 2026

Uh oh!

jfberry commented Apr 8, 2026

Uh oh!

Mygod commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Mygod commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Ground rule

What this PR adds

Canonical battle behavior

Identity / persistence

API / webhook impact

Deployment notes

Validation

Notes

Uh oh!

jfberry commented Apr 1, 2026

Uh oh!

jfberry commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — Pattern Compliance with Project Architecture

1. Write-behind queue for persistence

2. Entity struct pattern

3. Record access pattern

4. Setter methods with dirty tracking

Uh oh!

Mygod commented Apr 6, 2026

Uh oh!

jfberry commented Apr 8, 2026

Uh oh!

Mygod commented Apr 8, 2026

Uh oh!

Mygod commented Apr 8, 2026

Uh oh!

jfberry commented Apr 8, 2026

Uh oh!

Mygod commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mygod commented Mar 31, 2026 •

edited

Loading

jfberry commented Apr 1, 2026 •

edited

Loading