Skip to content

Conversation

@mikewilli
Copy link
Contributor

@mikewilli mikewilli commented Oct 7, 2025

This doc describes an approach to resolving bigquery resource/timeout errors stemming from the metrics query. The proposed approach is two-phased: first we will work to resolve issues with discrete metrics, and monitor the results to determine if further action is necessary. If necessary, we will follow-on with a second phase of aggregating weekly and overall metrics from daily metrics, which would reduce the amount of data being queried, and eliminate the need to re-query telemetry tables for weekly and overall periods.

@mikewilli mikewilli force-pushed the daily-metric-aggregations-proposal branch from 3caefa0 to 47d57fe Compare October 22, 2025 19:59
@mikewilli mikewilli marked this pull request as ready for review October 22, 2025 19:59
ORDER BY client_level_daily_active_users_v2 DESC
```
- If the above does not cover what we need, then some alternative options:
- add a new metric-hub parameter for metrics to specify how Jetstream should aggregate across days
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about metrics that just can't really be aggregated across days. Like COUNT DISTINCT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea there's definitely some loss of fidelity for certain metrics with the proposed approach. As Daniel suggested I'll flesh out the various options here so we can decide whether this is the best approach and what the alternatives are.

- `MIN` enrollment and exposure dates is not necessary since we already do this when building enrollments
- `SUM` enrollment and exposure events
- `LOGICAL_OR` for boolean metrics
- `SUM` for int metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section needs to be more fleshed out. IMO, this is really the meat of the proposal so we should align on what options we'd consider, what their pros/cons are, etc. before we can make a decision.

- retain the original column names in the daily metrics query output, and then just run the original metric `select_expression` against the daily results tables
- possibly store all daily values in a histogram field so that we can do arbitrary aggregations/statistics without loss of fidelity

2. How to handle pre-enrollment analysis periods?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the cuff I would expect pre-experiment periods to work the same as all other periods (aggregate to day level, the aggregate to pre-enrollment period level)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest differences here are:

  • these aren't critical for results availability
  • we'd need to run all of these daily queries on the first day of analysis, whereas for post-enrollment periods we typically already have those computed

That said, I agree that it would be a little odd for these to have a special workflow. I'll update the proposal to clarify the proposed route for this though (likely by making them work the same as other periods).

@mikewilli mikewilli changed the title docs: proposal for daily metric aggregations docs: proposal for resolving metrics query failures Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants