Skip to content

feat: listening graph — ingestion pipeline, analysis, and interactive visualization#63

Merged
cdxker merged 77 commits intomainfrom
graphs
Mar 1, 2026
Merged

feat: listening graph — ingestion pipeline, analysis, and interactive visualization#63
cdxker merged 77 commits intomainfrom
graphs

Conversation

@cdxker
Copy link
Owner

@cdxker cdxker commented Feb 28, 2026

Summary

Full listening graph feature built across 7 phases:

Phase 1 — Data Ingestion & Storage

  • Last.fm scrobble fetcher with pagination and checkpointing
  • Spotify OAuth2 flow + recently played / playlist fetcher
  • Graph builder that normalizes and merges cross-source listening data into directed edges
  • SQLite persistence layer via better-sqlite3
  • Hono API server serving graph data with pipeline endpoints

Phase 2 — Analysis

  • Weighted PageRank algorithm on the listening graph
  • Per-node and graph-level statistics with rankings
  • Louvain community detection for clustering related tracks
  • Enriched export combining all analysis data

Phase 3 — Visualization

  • Sigma.js + graphology graph rendering (dark theme)
  • Interactive features: hover tooltips, click-to-focus, neighbor highlighting
  • Cluster color-coding with legend, toggle, and focus mode
  • Search by song/artist, filter by play count / PageRank / source / edge weight
  • Path explorer: shortest and strongest paths between two songs

Phase 4 — Polish

  • Codebase cleanup: consolidated 5 redundant files, DRYed up pipeline handlers
  • Always-focused node view: graph always has one node selected with contrast highlighting

Phase 5 — Visual Refresh

  • Monochrome brightness hierarchy: removed hue-based coloring, encode importance via brightness
  • Node artwork support: album art on graph nodes (frontend + ingestion data contract)
  • Search Enter key navigation fix
  • Three-layer neighborhood depth view with weight-based fading

Phase 6 — PR Review Triage

  • Triaged 13 BugBot comments into actionable tickets
  • Closed 3 as invalid/not-planned with justification
  • Applied 2 trivial "Do Now" fixes (typo rename, deleted stale file)

Phase 7 — Bug Fixes (from Triage)

  • Pipeline idempotency: clear graph tables before save to prevent data doubling on re-run
  • Spotify double fetch: pass pre-fetched dump to exportToJson to eliminate redundant API calls
  • Stack overflow fixes: replaced Math.min/max spreads with loops in build-graph, lastfm-fetcher, and toGraphology (handles >65k items)
  • PathPanel stale async: functional state updates in async callbacks to prevent stale closure overwrites
  • PathPanel loading guard: reset loading state on cleanup so cancelled fetches don't permanently block refetch
  • Source breakdown semantics: track per-source play counts (scrobbles, not nodes)
  • Modularity formula: divide by 2m instead of m; account for non-adjacent same-community pairs
  • Dead code removal: removed unused sigmaIn array from Louvain
  • Data path consistency: unified CWD-relative and import.meta.dirname paths
  • Source plays merge: merge source_plays additively on incremental save
  • Search early exit: replaced forEachNode with breakable for...of loop (stops after 20 matches)
  • Playlist name collision: key grouping by playlist ID instead of name
  • Self-loop double weight: skip reverse assignment when i === j in cluster adjacency
  • Unused parameter cleanup: removed dead keyIndex param from computeClusterStats
  • Depth-layer optimization: per-node edge iteration instead of full-graph scan
  • Bootstrap reducer race: preserve bootstrap reducers when selection is in-flight

Open issues (2 remaining)

  • 05-FixPathPanelCleanupRace — cleanup effect races with just-resolved promise
  • 05-FixSourcePlaysSqlMerge — source_plays SQL upsert overwrites instead of merging additively

Architecture

  • Backend (graph-pipeline/): Node + Hono + SQLite — ingestion, graph construction, analysis, API
  • Frontend (site/): Astro + React + Sigma.js — interactive graph visualization

Test plan

  • Backend tests pass: 132 tests across 13 files
  • Frontend tests pass: 48 tests across 4 files
  • Start backend (pnpm run serve), run POST /pipeline/fetch/lastfm then POST /pipeline/build
  • Verify graph loads at localhost:4321/graph with a node auto-focused
  • Click nodes to switch focus — should never lose focus
  • Test search, filters, cluster legend, and path explorer
  • Full re-ingest required after deploy (new playlistId field)

🤖 Generated with Claude Code

cdxker and others added 30 commits February 28, 2026 02:13
Independent TypeScript/Node project with ESLint, Prettier, Vitest.
Folder structure: src/ingestion, src/graph, src/analysis, src/server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…efineGraphSchema)

SongKey, GraphNode, ListeningGraph types in src/graph/types.ts.
Raw ingestion types (RawScrobble, RawSpotifyRecentTrack, RawSpotifyPlaylistTrack)
in src/ingestion/types.ts. Includes toSongKey() helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…1/02-DataStorage)

Evaluated Neo4j, SQLite, PostgreSQL, TinyBase. Recommending SQLite for
zero infrastructure, right-sized capacity, and deployment simplicity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…MAuth)

- Config loader reads LASTFM_API_KEY and LASTFM_USERNAME from .env
- Fails fast with clear error messages if either is missing
- LastfmClient class wraps authenticated API requests (key as query param)
- verifyAuth() method calls user.getInfo to validate credentials
- 8 tests covering config validation and client behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…yOAuth)

Add SpotifyAuth class with full OAuth2 Authorization Code flow for
CLI/server-side usage. Includes local HTTP callback server, token
persistence to disk, automatic refresh, and browser auto-open.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ESM fixes

- Delete package-lock.json and add to .gitignore (project uses Yarn)
- Add vitest.config.ts scoped to src/**/*.test.ts to avoid running dist/
- Add Spotify env vars to .env.example
- Fix DEFAULT_TOKEN_PATH to use import.meta.url instead of process.cwd()
- Replace require("node:child_process") with top-level static import
- Fix pre-existing lint issues in touched files (unused import, let→const)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pLastFM)

Paginated fetcher for user.getRecentTracks with:
- 200 tracks/page with 1 req/sec rate limiting
- Resume support via checkpoint file (saves last timestamp)
- Edge case handling: skips now-playing, deduplicates, skips missing fields
- Progress logging callback
- Output as sorted JSON matching RawScrobble type
- 9 tests covering pagination, edge cases, resume, and dedup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tify)

Add SpotifyClient class that fetches recently played tracks and all
playlist track orderings from Spotify API. Includes pagination,
429 rate limit handling with Retry-After, and edge case handling
for local files, podcast episodes, and empty playlists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ntAndFormat)

- Fix no-explicit-any in lastfm-fetcher.test.ts (use ReturnType cast)
- Fix unused _init param in spotify-client.test.ts
- Run prettier on unformatted files
- All lint, format, and tests now pass cleanly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…1/04-BuildGraph)

Core graph construction algorithm:
- Normalizes tracks to SongKey (case-insensitive artist::track)
- Last.fm: chronological scrobble pairs create weighted edges
- Spotify recent: same approach, sorted by playedAt
- Spotify playlists: consecutive tracks per playlist create edges
- Cross-source merge: same SongKey sums edge weights, unions sources
- Metadata: totalScrobbles, dateRange, exportTimestamp, usernames
- Edge cases: skips invalid tracks, handles single-track sessions
- 20 tests covering all acceptance criteria

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ookUpExportToDatabase)

GraphDatabase class using better-sqlite3 with:
- nodes, edges, metadata tables (auto-created)
- Upsert logic merging play counts, edge weights, and sources
- Incremental update support (second save merges with existing)
- WAL mode + transactions for large graph efficiency
- Round-trip tested (6 tests, all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone API server serving graph data from SQLite via Hono:
- GET /graph (full or paginated with limit/offset)
- GET /graph/node/:songKey (single node with edges)
- GET /graph/neighbors/:songKey (neighbor nodes with weights)
- GET /graph/stats (node count, edge count, metadata)
CORS enabled, input validation, 404/400 error responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Guide)

Rewrote README.md with:
- Project overview, prerequisites, setup instructions
- Step-by-step usage guide (auth, fetch, build, store, serve)
- API server endpoints table (graph, node, neighbors, stats)
- npm scripts reference
- Full project structure with file descriptions
- Architecture decisions summary (SQLite, SongKey, Hono)
- Troubleshooting section (rate limits, token expiry, resume)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…s (phase2/01-BasicStats)

computeStats() returns per-node stats (in/out degree, weighted degree),
graph-level summary (total nodes/edges, source breakdown, avg/median degree),
and configurable top-N rankings (most played, most connected, highest in/out degree).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Iterative PageRank on the directed listening graph:
- Configurable damping factor (0.85), convergence threshold (0.0001), max iterations (100)
- Weighted: outgoing edge weights normalized as transition probabilities
- Dangling nodes redistribute rank evenly to all nodes
- Handles disconnected components via teleportation
- Mutates graph in place, setting pageRank on each GraphNode
- getTopByPageRank() helper for sanity checking top-ranked songs
- Added optional pageRank field to GraphNode type
- 13 tests covering convergence, weights, dangling nodes, disconnected components

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Detection)

Louvain method for weighted community detection:
- Optimizes modularity by iteratively moving nodes to best neighboring community
- Treats edges as undirected (sum of next + previous weights)
- Assigns contiguous clusterId (0, 1, 2...) to each GraphNode
- Returns ClusterResult: count, per-cluster stats (size, top songs, inter-cluster edges), modularity
- Added optional clusterId field to GraphNode type
- 11 tests covering separation, merging, stats, edge counting, modularity

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…2-EnrichedExport)

- enrichGraph() orchestrates all analysis (PageRank, clusters, stats)
- exportEnrichedGraph() writes enriched graph + summary as JSON
- DB schema adds page_rank/cluster_id columns, persisted through save/load
- New GET /graph/analysis endpoint serves full analysis summary
- Enriched schema documented for Phase 3 handoff

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ASCII wireframes and design spec for the graph view page:
- New /graph Astro page with React GraphView component
- Left sidebar: search, filters (play count, edge weight, source), cluster legend
- Main canvas: force-directed layout, nodes sized by plays, colored by cluster
- Bottom panel: node details with PageRank, neighbors, metrics on click
- Stats overlay panel
- Cluster colors mapped to chart-1 through chart-5 CSS variables
- Desktop-first, mobile degrades to list view

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e3/01-PickVisualizationLibrary)

WebGL-native rendering handles 50k+ nodes, @react-sigma v5 provides modern
React 19 hooks API, and graphology ecosystem includes ForceAtlas2 layout,
shortest-path algorithms, and community detection utilities needed by
downstream tickets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…endering)

New /graph page with force-directed graph visualization:
- SigmaContainer with WebGL rendering (Sigma.js + graphology)
- ForceAtlas2 layout via Web Worker (auto-stops after 5s)
- Nodes sized by play count (log scale), colored by cluster ID
- Edges with weight-based thickness and opacity
- Dark theme: #0B0B0B background, DM Mono labels
- useGraphData hook fetches from API, falls back to mock data
- Header with back navigation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… caching (phase3/02-GraphDataLayer)

- graph-types.ts: frontend types mirroring graph-pipeline API
- graph-api.ts: fetchGraph with in-memory cache, filterGraph (by source/plays/cluster),
  toGraphology transform with cluster colors and log-scale sizing
- GraphContext.tsx: React context with loading/error states, filter support, refresh
- Updated useGraphData.ts to use shared data layer, preserved mock fallback
- Installed sigma, graphology, @react-sigma/core, @react-sigma/layout-forceatlas2
- Fixed vitest alias path
- 19 new tests (27 total in site/)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ghbor highlighting (phase3/03-InteractiveFeatures)

- GraphEvents: Sigma event handling with nodeReducer/edgeReducer for
  neighbor highlighting (dims non-neighbors, hides unconnected edges)
- NodeTooltip: hover shows song name, artist, play count, PageRank
- EdgeTooltip: hover shows transition count between two songs
- NodeDetailPanel: right sidebar with stats, cluster color, and clickable
  neighbor list (outgoing/incoming sorted by weight)
- GraphNavigator: camera animation when clicking neighbors in detail panel
- Deselect: click empty space or press Escape

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ase3/03-ClusterView)

- ClusterLegend component: shows clusters sorted by size with color dots,
  top songs preview, eye toggle to hide/show, and focus button to isolate
- useClusterInfo hook: extracts cluster summaries (size, label by most
  common artist, top songs by play count) from graphology graph
- Composed cluster-based nodeReducer/edgeReducer with existing hover/select
  highlighting in GraphEvents
- Inter-cluster edges styled thinner and more transparent than intra-cluster
  edges in both toGraphology and mock data
- Focus mode: isolates cluster, hides inter-cluster edges, dims other nodes
- Toggle mode: completely hides nodes and connected edges for hidden clusters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ase3/04-SearchAndFilter)

- SearchBar: autocomplete search by song/artist, arrow key navigation,
  selecting a result centers camera and selects the node
- FilterPanel: source toggles, min plays slider, top PageRank percentile
  slider, min edge weight slider, active filter summary with reset
- GraphFilters: applies filters via graphology hidden attribute on nodes
  and edges, composes with neighbor-highlight reducers via AND logic
- Filter toggle button in top-left, search bar in top-right

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

cdxker and others added 6 commits February 28, 2026 17:16
Ticket: phase7-bugfixes/02-FixSidebarNavigationHighlighting\n\nStaged files:\n- site/src/components/GraphView.tsx\n- tickets/TICKETS.md\n\nGraphNavigator now emits Sigma's clickNode event after camera animation so sidebar/search navigation follows the same selection + reducer pipeline as direct canvas clicks. Mark ticket complete in TICKETS.md.
Prevents RangeError when newScrobbles exceeds V8's ~65k argument limit.
Adds test with 70k mock scrobbles to verify no stack overflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces CWD-relative paths with import.meta.dirname-based resolution
so fetch and build endpoints resolve to the same data/ directory
regardless of working directory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces edge-pair modularity with per-community formula
Q = Σ_c [L_c/m − (d_c/(2m))²] to correctly account for non-adjacent
same-community pairs. Removes unused sigmaIn array from Louvain.
Updates tests with exact expected modularity values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reads existing source_plays from DB and merges per-source counts
instead of replacing the entire JSON blob. Adds test verifying
cross-source merge on successive saves.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cdxker
Copy link
Owner Author

cdxker commented Mar 1, 2026

Bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

cdxker and others added 2 commits February 28, 2026 18:08
- 03-FixSearchEarlyExit: forEachNode return doesn't break iteration
- 03-FixPathPanelLoadingGuard: cancelled fetch leaves loading stuck
- 03-FixToGraphologySpreadOverflow: Math.max spread crashes on >65k nodes
- 03-FixPlaylistNameCollision: same-name playlists merged incorrectly

Also corrects 01-Triage-SearchNoEarlyExit disposition (was wrongly
closed as invalid).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds graphDebug logging and sigma.refresh() calls to trace the
reducer pipeline during sidebar/search navigation. Updates ticket
with debugging log, evidence from browser logs, and next experiments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

cdxker and others added 10 commits February 28, 2026 18:17
forEachNode callback return is a no-op — it doesn't break iteration.
SearchBar and PathPanel now use for...of over graph.nodes() with break
after 20 matches, avoiding full graph scans on every keystroke.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ckout

When a path fetch was cancelled mid-flight (algorithm or node change),
the cleanup only set cancelled=true but never reset loading to false.
The if(state.loading) guard then permanently blocked new fetches.
Now the cleanup resets loading so the next effect run can proceed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Math.max(...entries.map(...)) spreads the entire node array as function
arguments, hitting RangeError on graphs with >65k nodes. Replaced with
a simple loop. Added test verifying 70k nodes without crash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
processSpotifyPlaylists grouped tracks by playlistName, merging distinct
playlists with the same name into one edge group. Added playlistId field
to RawSpotifyPlaylistTrack and keyed the grouping map by ID instead.
Requires full re-ingest since old dumps lack the playlistId field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Self-loop double weight, unused keyIndex param, depth-layer edge
iteration optimization, and bootstrap reducer race condition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When i === j, both weights[i][j] and weights[j][i] write to the same
cell, doubling the self-loop weight. Skip the reverse assignment for
self-loops so weight is counted once, consistent with totalWeight.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…terStats

keyIndex was passed but never referenced inside the function body.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…raph

The post-BFS edge collection scanned every edge in the graph (O(E)) to
find edges between neighborhood nodes. Now iterates only edges of nodes
already in the depth neighborhood, avoiding the full-graph scan on each
node click.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…light

GraphEvents' reducer effect cleared all reducers when activeNode was
null, but on initial mount the bootstrap selection from GraphInner
hadn't propagated yet. Now skips clearing when externalSelectedKey is
set, preventing a brief flash of un-highlighted graph on load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cdxker
Copy link
Owner Author

cdxker commented Mar 1, 2026

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@cdxker
Copy link
Owner Author

cdxker commented Mar 1, 2026

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

return () => {
cancelled = true
onStateChange({ ...stateRef.current, loading: false })
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PathPanel cleanup effect resets loading state spuriously

Medium Severity

The useEffect cleanup function calls onStateChange({ ...stateRef.current, loading: false }) whenever the effect re-runs (due to from, to, or algorithm changing). This fires before the new effect body runs, potentially overwriting a freshly-set result captured in stateRef.current from a just-resolved promise. Since the cleanup uses the stale ref at teardown time and forcefully sets loading: false, it can race with the .then() callback and reset a valid result to loading: false with a stale or null result.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ticket has been made for this

page_rank = COALESCE(excluded.page_rank, nodes.page_rank),
cluster_id = COALESCE(excluded.cluster_id, nodes.cluster_id),
image_url = COALESCE(excluded.image_url, nodes.image_url),
source_plays = COALESCE(excluded.source_plays, nodes.source_plays)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Database source_plays upsert silently overwrites instead of merging

Low Severity

The SQL upsert for source_plays uses COALESCE(excluded.source_plays, nodes.source_plays), which simply picks the new value over the old one. The additive merge logic only works because the TypeScript code above pre-merges with the existing DB row. But sources on line 91 uses the same pattern (excluded.sources) and relies on the same pre-merge. If a future caller bypasses the TypeScript merge (e.g., batch insert), source_plays from the new row would silently overwrite the old one, losing per-source counts, rather than merging additively like total_plays and weight do at the SQL level.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ticket has been made for this

PathPanel cleanup race condition and source_plays SQL merge issue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cdxker
Copy link
Owner Author

cdxker commented Mar 1, 2026

I'm pretty happy with this state of polish. merging so I can have smaller pr's now

@cdxker cdxker merged commit 54d23cd into main Mar 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants