Skip to content

Reduce imported opts queries#6106

Open
ukutaht wants to merge 4 commits intomasterfrom
reduce-imported-opts-queries
Open

Reduce imported opts queries#6106
ukutaht wants to merge 4 commits intomasterfrom
reduce-imported-opts-queries

Conversation

@ukutaht
Copy link
Contributor

@ukutaht ukutaht commented Feb 26, 2026

Currently a call to Query.build will result in reduntant Repo.preload(site, :completed_imports) calls:

  • 2 times for queries with no comparisons
  • 3 times for queries with comparisons

This PR makes two changes to preloading completed_imports:

  1. Do not preload at all if we are skipping imports anyways due to :unsupported_interval or :unsupported_query. The biggest win here is that for daily stats (interval == "hour") we will do 0 preloads as opposed to 2 or 3 postgres roundtrips.
  2. If preloading is necessary, hoist the preload a bit higher in the call tree so it does not need to be fetched multiple times

/sites sparklines

I looked into this because the separate sparkline graph queries on /sites currently result in 5 identical preloads per site card: 3 preloads for query_24h_stats and 2 for query_24h_intervals. With default page_size of 24 this will result in 120 postgres queries per page load when one per site would suffice.

With this PR, this will cut it down from 5 identical redundant queries per site to 0 because with interval == "hour" the preloads are not needed. For longer time ranges in works by @aerosol, it will be 2 queries per site. It would be possible to cut it down to 1 per site at the cost of some complexity. For that the Sparklines.overview_24h function would have to figure out whether imports are supported for the query and if so, run the preload before calling query_24h_stats and query_24h_intervals.

However, this would require exposing some of the Query internals and I felt like it isn't worth it at the moment. One duplicated (surely cached on postgres side) query per site is not terrible.

@ukutaht ukutaht requested review from a team and RobertJoonas February 26, 2026 19:40
@ukutaht ukutaht force-pushed the reduce-imported-opts-queries branch from b8b2af8 to 24947d9 Compare February 26, 2026 20:01
@ukutaht ukutaht marked this pull request as draft February 26, 2026 20:12
@ukutaht ukutaht marked this pull request as ready for review February 27, 2026 01:14
@aerosol
Copy link
Member

aerosol commented Feb 27, 2026

Interesting, I like where this is going but I find the query building code a bit difficult to follow, how did you determine the number of preloads in your assessment?

I understand that for "time:minute" and "time:hour" we don't need them at all, but otherwise aren't preloads idempotent at Ecto level?

@aerosol
Copy link
Member

aerosol commented Feb 27, 2026

Another minor optimization we could also do there is, we don't need to query for :visitors, :visits, :pageviews, :views_per_visit for regular site cards, just :visitors should be enough (as opposed to consolidated views). But that can be done by the way of building those more complex queries at varying ranges.

@ukutaht
Copy link
Contributor Author

ukutaht commented Feb 27, 2026

how did you determine the number of preloads in your assessment?

I noticed that loading the /sites page resulted in a lot of duplicate DB queries in local server logs but it was pretty hard to parse out what was going on. I captured the logs from a single page refresh and asked claude to figure out what's going on. My previous PR was found, this PR is the second issue.

Manually verified with logs of a single sparkline request.

master:

iex(3)> Plausible.Stats.Sparkline.overview_24h(site)
12:38:08.454 [debug] QUERY OK source="teams" db=2.2ms idle=1965.2ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:38:08.455 [debug] QUERY OK source="subscriptions" db=0.9ms idle=1968.5ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:38:08.461 [debug] QUERY OK source="site_imports" db=1.4ms idle=1973.4ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.462 [debug] QUERY OK source="site_imports" db=0.4ms idle=1975.0ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.462 [debug] QUERY OK source="site_imports" db=0.3ms idle=1975.7ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.497 [debug] QUERY OK db=33.8ms idle=1968.4ms
SELECT s0."pageviews",s0."visitors",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08], 1, ~N[2026-02-19 10:38:08], ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51
12:38:08.509 [debug] QUERY OK db=11.4ms idle=2.4ms
SELECT s0."pageviews",s0."visitors",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-25 10:38:08], ~N[2026-02-26 10:38:08], 1, ~N[2026-02-18 10:38:08], ~N[2026-02-25 10:38:08], ~N[2026-02-26 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_comparison_query/1, at: lib/plausible/stats/query_runner.ex:81
12:38:08.513 [debug] QUERY OK source="teams" db=0.5ms idle=1033.0ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:38:08.515 [debug] QUERY OK source="subscriptions" db=1.0ms idle=1033.9ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:38:08.516 [debug] QUERY OK source="site_imports" db=0.7ms idle=868.1ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.516 [debug] QUERY OK source="site_imports" db=0.6ms idle=62.3ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Imported.get_completed_imports/1, at: lib/plausible/imported.ex:209
12:38:08.524 [debug] QUERY OK source="sessions_v2" db=7.0ms idle=21.6ms
SELECT toUInt64(round(uniq(s0."user_id") * any(_sample_factor))) AS "visitors",toStartOfHour(f1) AS "time" FROM "sessions_v2" AS s0 ARRAY JOIN timeSlots(toTimeZone(s0."start", {$0:String}), toUInt32(timeDiff(s0."start", s0."timestamp")), toUInt32({$1:Int64})) AS f1 WHERE ((s0."site_id" = {$2:Int64}) AND (s0."start" >= {$3:DateTime}) AND (s0."timestamp" >= {$4:DateTime}) AND (s0."start" <= {$5:DateTime})) GROUP BY "time" ORDER BY "time" ["Etc/UTC", 900, 1, ~N[2026-02-19 10:38:08], ~N[2026-02-26 10:38:08], ~N[2026-02-27 10:38:08]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51

Note that there are 5 queries to site_imports

this branch:

iex(4)> Plausible.Stats.Sparkline.overview_24h(site)
12:28:40.530 [debug] QUERY OK source="teams" db=3.1ms idle=1294.5ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:28:40.532 [debug] QUERY OK source="subscriptions" db=0.8ms idle=1298.6ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:28:40.532 [debug] QUERY OK source="site_imports" db=0.5ms idle=1299.6ms
SELECT s0."id", s0."start_date", s0."end_date", s0."label", s0."source", s0."status", s0."legacy", s0."has_scroll_depth", s0."site_id", s0."imported_by_id", s0."inserted_at", s0."updated_at", s0."site_id" FROM "site_imports" AS s0 WHERE ((s0."site_id" = $1) AND (s0."status" = $2)) ORDER BY s0."site_id" [1, :completed]
↳ Plausible.Stats.Query.put_imported_opts/2, at: lib/plausible/stats/query.ex:155
12:28:40.558 [debug] QUERY OK db=24.7ms idle=1316.4ms
SELECT s0."visitors",s0."pageviews",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40], 1, ~N[2026-02-19 10:28:40], ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51
12:28:40.577 [debug] QUERY OK db=19.0ms idle=1341.8ms
SELECT s0."visitors",s0."pageviews",s0."visits",s0."pageviews" AS "pageviews",s0."visitors" AS "visitors",s0."visits" AS "visits",s1."views_per_visit" AS "views_per_visit" FROM (SELECT toUInt64(round(uniq(se0."user_id") * any(_sample_factor))) AS "visitors",toUInt64(round(countIf(se0."name" = 'pageview') * any(_sample_factor))) AS "pageviews",toUInt64(round(uniq(se0."session_id") * any(_sample_factor))) AS "visits" FROM "events_v2" AS se0 WHERE ((se0."site_id" = {$0:Int64}) AND (se0."timestamp" >= {$1:DateTime}) AND (se0."timestamp" <= {$2:DateTime}))) AS s0 LEFT JOIN (SELECT greatest(ifNotFinite(round(sum(ss0."sign" * ss0."pageviews") / sum(ss0."sign"), 2), 0), 0) AS "views_per_visit",toUInt32(greatest(sum(sign), 0)) AS "__internal_visits" FROM "sessions_v2" AS ss0 WHERE ((ss0."site_id" = {$3:Int64}) AND (ss0."start" >= {$4:DateTime}) AND (ss0."timestamp" >= {$5:DateTime}) AND (ss0."start" <= {$6:DateTime}))) AS s1 ON 1 ORDER BY "visitors" DESC [1, ~N[2026-02-25 10:28:40], ~N[2026-02-26 10:28:40], 1, ~N[2026-02-18 10:28:40], ~N[2026-02-25 10:28:40], ~N[2026-02-26 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_comparison_query/1, at: lib/plausible/stats/query_runner.ex:81
12:28:40.578 [debug] QUERY OK source="teams" db=0.6ms idle=1345.5ms
SELECT t0."id", t0."identifier", t0."name", t0."trial_expiry_date", t0."accept_traffic_until", t0."allow_next_upgrade_override", t0."locked", t0."setup_complete", t0."setup_at", t0."hourly_api_request_limit", t0."notes", t0."policy", t0."grace_period", t0."inserted_at", t0."updated_at", t0."id" FROM "teams" AS t0 WHERE (t0."id" = $1) [1]
↳ Plausible.Props.allowed_for/2, at: lib/plausible/props.ex:60
12:28:40.579 [debug] QUERY OK source="subscriptions" db=0.4ms idle=1346.6ms
SELECT s0."id", s0."paddle_subscription_id", s0."paddle_plan_id", s0."update_url", s0."cancel_url", s0."status", s0."next_bill_amount", s0."next_bill_date", s0."last_bill_date", s0."currency_code", s0."team_id", s0."inserted_at", s0."updated_at", s0."team_id" FROM "subscriptions" AS s0 WHERE (s0."team_id" = $1) ORDER BY s0."inserted_at" DESC, s0."id" DESC LIMIT 1 [1]
↳ Plausible.Teams.Billing.allowed_features_for/1, at: lib/plausible/teams/billing.ex:677
12:28:40.587 [debug] QUERY OK source="sessions_v2" db=6.8ms idle=1363.5ms
SELECT toUInt64(round(uniq(s0."user_id") * any(_sample_factor))) AS "visitors",toStartOfHour(f1) AS "time" FROM "sessions_v2" AS s0 ARRAY JOIN timeSlots(toTimeZone(s0."start", {$0:String}), toUInt32(timeDiff(s0."start", s0."timestamp")), toUInt32({$1:Int64})) AS f1 WHERE ((s0."site_id" = {$2:Int64}) AND (s0."start" >= {$3:DateTime}) AND (s0."timestamp" >= {$4:DateTime}) AND (s0."start" <= {$5:DateTime})) GROUP BY "time" ORDER BY "time" ["Etc/UTC", 900, 1, ~N[2026-02-19 10:28:40], ~N[2026-02-26 10:28:40], ~N[2026-02-27 10:28:40]]
↳ Plausible.Stats.QueryRunner.execute_main_query/1, at: lib/plausible/stats/query_runner.ex:51

There is 1 query to site_imports. In my PR description I said it will be 0 but I overlooked the fact that the query_24h_stats does not use an interval so for that it will still make one query to site_imports.

aren't preloads idempotent at Ecto level?

They are but only in the case that the result is stored and re-used. We often don't. For example:

site = ...
Repo.preload(site, :completed_imports) # Fires DB query. Returns site with completed_imports preloaded
Repo.preload(site, :completed_imports) # Fires DB query again because the preload from last line was discarded
site = Repo.preload(site, :completed_imports) # Preload completed imports and store the site with preloaded data in variable
Repo.preload(site, :completed_imports) # Does not fire DB query since the site variable now has preloaded data

Another minor optimization we could also do there is, we don't need to query for :visitors, :visits, :pageviews, :views_per_visit for regular site cards

Nice! I hadn't considered that. I don't think it's that minor because it means we'll avoid JOINing with sessions in clickhouse queries for site cards. That would be a significant win.

@aerosol
Copy link
Member

aerosol commented Feb 27, 2026

Nice! I hadn't considered that. I don't think it's that minor because it means we'll avoid JOINing with sessions in clickhouse queries for site cards. That would be a significant win.

We can include it in this PR if you like via #6109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants