-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Infrastructure Cost Attribution Framework
Workstream: G — Cost & Resource Monitoring
Parent: #195 (Execution Governance Blueprint)
Classification: Optional / Premature
Effort: 2 days
Phase: Scale (Phase 5)
⚠️ Over-Engineering Warning
This issue is classified as Optional/Premature per the governance blueprint (#195).
The platform does not yet have enough operational scale to justify a dedicated cost attribution framework. This issue should NOT be started until:
- Monthly Supabase costs exceed $100+
- CI pipeline runtime exceeds 30 min/run
- Event ingestion volume exceeds 100k events/month
- At least 3 country expansions are live
Current recommendation: DEFER indefinitely. Revisit when any threshold above is breached.
If premature implementation is detected, this issue should be closed with a "deferred — premature" label.
Problem Statement (For Future Reference)
As the platform scales beyond Poland to multiple countries (#189-#193 multi-country support), infrastructure costs will grow across three dimensions:
- Database query costs — Supabase compute usage from complex scoring, search, and provenance queries
- CI/CD runtime costs — GitHub Actions minutes for expanded test suite and multi-country pipelines
- Event ingestion costs — Volume-based costs for analytics event storage ([ARCH] Event-Based Product Intelligence Layer — Behavioral Telemetry, ML Readiness & A/B Infrastructure #190)
Without attribution, cost increases cannot be traced to specific workstreams, making optimization decisions uninformed.
Scope (When Activated)
1. Query Cost Attribution
-- Track query cost by domain using pg_stat_statements
-- Map queryid → domain using naming conventions from #197
CREATE OR REPLACE VIEW v_query_cost_by_domain AS
SELECT
CASE
WHEN query ILIKE '%scoring%' OR query ILIKE '%health_score%' THEN 'scoring'
WHEN query ILIKE '%search%' OR query ILIKE '%tsvector%' THEN 'search'
WHEN query ILIKE '%provenance%' THEN 'provenance'
WHEN query ILIKE '%event%' OR query ILIKE '%analytics%' THEN 'events'
WHEN query ILIKE '%flag%' OR query ILIKE '%feature%' THEN 'flags'
ELSE 'other'
END AS domain,
count(*) AS query_count,
ROUND(SUM(total_exec_time)::NUMERIC, 2) AS total_exec_ms,
ROUND(SUM(total_exec_time)::NUMERIC / NULLIF(SUM(calls), 0), 2) AS avg_ms_per_call,
SUM(shared_blks_read) AS total_disk_reads,
SUM(shared_blks_hit) AS total_cache_hits
FROM pg_stat_statements
WHERE calls > 0
GROUP BY 1
ORDER BY total_exec_ms DESC;2. CI Runtime Tracking
# Concept: Add timing annotations to CI workflow
# Track per-job duration and map to workstreams
# Example reporting step:
- name: Report CI Timing
run: |
echo "## CI Runtime Report" >> $GITHUB_STEP_SUMMARY
echo "| Job | Duration | Workstream |" >> $GITHUB_STEP_SUMMARY
echo "|---|---|---|" >> $GITHUB_STEP_SUMMARY
echo "| SQL QA Tests | ${SQL_QA_DURATION}s | testing |" >> $GITHUB_STEP_SUMMARY
echo "| Frontend Tests | ${FRONTEND_DURATION}s | frontend |" >> $GITHUB_STEP_SUMMARY
echo "| E2E Tests | ${E2E_DURATION}s | testing |" >> $GITHUB_STEP_SUMMARY
echo "| Migration Check | ${MIGRATION_DURATION}s | migration |" >> $GITHUB_STEP_SUMMARY3. Event Ingestion Cost Monitoring
-- Track event volume by type for cost projection
CREATE OR REPLACE VIEW v_event_volume AS
SELECT
event_type,
DATE_TRUNC('day', created_at) AS event_date,
count(*) AS event_count,
pg_size_pretty(SUM(pg_column_size(payload))::BIGINT) AS payload_size
FROM events -- table from #190
GROUP BY 1, 2
ORDER BY event_date DESC, event_count DESC;
-- Monthly projection
CREATE OR REPLACE VIEW v_event_cost_projection AS
SELECT
DATE_TRUNC('month', created_at) AS month,
count(*) AS total_events,
pg_size_pretty(SUM(pg_column_size(payload))::BIGINT) AS total_payload_size,
-- Rough cost estimate: Supabase Pro = included storage, but compute matters
ROUND(count(*) / 1000000.0 * 5, 2) AS estimated_monthly_cost_usd -- $5 per million events (adjust)
FROM events
GROUP BY 1
ORDER BY 1 DESC;4. Cost Dashboard Card (for #206)
// Addition to Admin Health Dashboard (#206)
// Card: "Infrastructure Costs"
// Shows:
// - Query compute: XX ms total this week (by domain)
// - CI runtime: XX min this week (trend)
// - Event volume: XX events this month (projection)
// - Storage: XX GB used / XX GB plan limit
// Thresholds:
// Green: <50% of budget
// Amber: 50-80% of budget
// Red: >80% of budgetAcceptance Criteria (When Activated)
-
v_query_cost_by_domainview created and accurate - CI runtime tracking added to GitHub Actions workflow
-
v_event_volumeandv_event_cost_projectionviews created - Cost dashboard card added to admin health dashboard ([GOV-D2] Admin Governance Dashboard Suite #206)
- Monthly cost report template documented
- Budget thresholds defined and alerts configured
Activation Triggers
This issue transitions from "Deferred" to "Active" when ANY of these thresholds is breached:
| Trigger | Threshold | How to Detect |
|---|---|---|
| Monthly Supabase cost | >$100 | Supabase billing dashboard |
| CI pipeline runtime | >30 min/run | GitHub Actions summary |
| Event volume | >100k events/month | SELECT count(*) FROM events WHERE created_at > NOW() - INTERVAL '30 days' |
| Country count | ≥3 countries live | SELECT DISTINCT country FROM products |
| Database size | >5GB | SELECT pg_size_pretty(pg_database_size(current_database())) |
Dependencies
- [Hardening 1/7] Production Observability + Error Telemetry #183 (Observability) — infrastructure monitoring
- [ARCH] Event-Based Product Intelligence Layer — Behavioral Telemetry, ML Readiness & A/B Infrastructure #190 (Events) — event table and volume
- [GOV-D2] Admin Governance Dashboard Suite #206 (Admin Dashboard) — cost dashboard card host
- [GOV-F2] Alert Escalation & Query Regression Detection #211 (Alert Escalation) — cost alerts follow escalation policy