This document consolidates everything developed so far, including detailed technical implementation, product decisions, monetization logic, and the forward roadmap. It is intended to be downloadable later as a Markdown file and used as:
- an internal execution record
- a product/UX validation report
- a future investor / partner briefing document
Project Name: SingSync
Tagline: Any song → karaoke instantly
SingSync is a web‑based karaoke media platform that removes friction between the intent to sing and the act of singing.
SingSync is deliberately not:
- an AI product
- a music streaming service
- a vocal training or scoring app
- a karaoke hardware replacement
SingSync is:
a place where people sing.
The product’s success is not measured by accuracy, but by:
- how quickly users start singing
- how many songs they sing per session
- whether they want to sing again
These rules guided every technical and UX decision:
• Singing flow must never be interrupted
• Audio playback must never stop for ads
• Lyrics must never be blocked or obscured
• No forced login before the first song
• No setup complexity
If a feature violates any of the above, it is rejected or deferred.
The following end‑to‑end flow is fully implemented and stable:
- Open SingSync
- Search song (current MVP: local playlist)
- Select a song
- Preparing karaoke… (≈5 seconds) – stream preparation – lyrics loading – cache checks – ad slot allowed here
- Countdown (3‑2‑1) – ad slot may remain visible
- Singing Mode – karaoke‑ready video playback – lyrics panel with line highlighting – keyboard / click controls – no ads, ever
- Exit → next song
This flow defines SingSync v1 and is intentionally locked.
• Framework: Next.js (App Router) • Client‑side state machine: – browse → preparing → countdown → singing
• Node.js + Express
• Current API:
– GET /api/songs → local playlist
• Video: MP4 (karaoke‑ready assets) • Lyrics: LRC (timestamp‑based)
• Keyboard: – Space / Enter → play / pause toggle • Mouse: – click overlay on video → toggle
Native browser controls are fully disabled to avoid interference.
• LRC files are loaded from /public/lyrics/<videoFile>.lrc
• Lyrics are parsed into (timestamp, text) pairs
• Active lyric line is calculated from video.currentTime
• Highlighted line updates via requestAnimationFrame
• Auto‑scroll keeps the active line centered
Lyrics timing is independent content.
If LRC is imperfectly aligned with the video, drift is accepted at MVP stage. Precision is a later optimization, not a launch blocker.
• Preparing karaoke screen (highest value) • Countdown screen (continuation) • Post‑song screen (future)
• Any ads during singing • Any ad blocking lyrics or video • Any audio ads • Any forced interaction mid‑song
• Preparing: rotating ad slot (dummy) • Countdown: final ad slot (dummy) • Singing: zero ads
This establishes a clean per‑song monetization model without UX damage.
Problem: • Autoplay with sound is blocked in most browsers
Solution: • Best‑effort autoplay • Guaranteed manual start via Space / Enter / Click
Problem:
• videoRef was null when phase switched to singing
Solution:
• All video.src assignment moved into useEffect(phase === "singing")
Problem: • Space triggers browser scroll / button focus
Solution:
• keydown → preventDefault
• actual toggle on keyup
• click‑capture overlay above video
Problem: • setInterval caused jitter
Solution:
• requestAnimationFrame loop synced to video time
• Repository: SingSync
• Branch: main
• Status: MVP fully pushed
• Large dev‑only files committed with warnings (acceptable for now)
GitHub now acts as the canonical execution log.
Goal: prove people actually want to sing this way.
Key metrics: • time to first song • songs per session • song completion rate • repeat usage
Planned behavior: • Multiple results per song query • Thumbnail + metadata • Inline preview playback (10–20s) • One preview at a time • Explicit “Sing this version” selection
Backend abstraction:
• GET /api/search?q=
• POST /api/prepare
Search source can be swapped later without touching UX.
• Job‑based processing • Cache hit → instant playback • Cache miss → progress UI • Persistent mapping: sourceKey → karaoke asset + lyrics
• Replace dummy ad slots with AdSense • Optional premium (ad‑free, recording) • Monetize per song, not per user
• Implement /api/search (stub)
• Search results UI with preview
• Preview auto‑stop logic
• Select‑to‑sing flow
• /api/prepare job creation
• Progress polling
• Cache status indicators
• LRC offset slider • Simple admin lyric editor • Community corrections
• Ad slot componentization • AdSense integration (loading only) • Post‑song ad screen
• Event schema definition • Local logging • Later analytics integration
• 10–20 music‑loving users • Laptop / desktop • Speakers + headphones
Task A: • Search → select → sing first song
Task B: • Start second song
Task C: • Observe lyrics usability
• Would you use this again? • Best part? • Most frustrating part? • Would you pay to remove ads?
Primary: • songs per session • completion rate • time to first song
Secondary: • preview → sing conversion • abandonment during preparing
SingSync is not a feature showcase.
It is a place.
If people leave having sung more than one song, the MVP succeeded.
If not, no amount of licensing, AI, or polish will save it.
This repository supports an AI-assisted development workflow used by the maintainers to accelerate small, well-scoped tasks.
The workflow is based on:
- A structured issue template (Linear)
- Strict development rules (
AGENT_RULES.md) - Human-reviewed pull requests (no autonomous merges)
- Tasks are written as small, explicit issues with acceptance criteria
- An AI agent may pick a single task from the Todo queue
- The agent works on a dedicated branch
- All changes are submitted via a pull request
- A human reviews and merges the PR
At no point does the agent:
- push directly to
main - merge its own work
- bypass repository protections
- Destructive actions are intentionally constrained
- All production changes require human approval
- This workflow is experimental and may change
For details, see AGENT_RULES.md.
When a song is prepared, SingSync now writes sync metadata to:
server/cache/<videoId>/sync.json
To (re)generate vocal onsets + phrase gaps for an already cached song:
./scripts/extract-sync-markers.sh <videoId>Force re-extraction even if markers already exist:
./scripts/extract-sync-markers.sh <videoId> --forceGenerate rough per-line timings from cached markers + lyrics:
./scripts/extract-line-timings.sh <videoId>Force regeneration:
./scripts/extract-line-timings.sh <videoId> --forceSubmit a line timing correction event:
curl -X POST http://localhost:PORT/sync/<videoId>/correct \
-H "Content-Type: application/json" \
-d '{"line_id":"line_12","new_start_ms":12345,"source":"ui"}'Apply a correction from CLI (without UI):
(cd server && npm exec tsx src/scripts/applyCorrection.ts <videoId> <line_id> <new_start_ms>)(End of Document)