MTE-4764 Querying new crashes that affect many users by clarmso · Pull Request #292 · mozilla-mobile/testops-dashboard

clarmso · 2026-02-02T22:29:25Z

This PR queries Sentry to see if there are any new issues that affect more than 1000 users or happen more than 1000 times during the last 3 days (due to weekend). If there are spikes, we send a slack notification. If there are no spikes, we do not send any notifications.

Here's a mockup of the Slack notification if new crashes are found:

I would like to run the query on preflight and staging.

clarmso · 2026-02-02T22:30:03Z

api/sentry/utils.py

+    if args.type == 'spike-issues':
+        main_spike_issues(args.file, args.project)
+    else:
+        main_rates(args.file, args.project)


I'm considering splitting this file into two: one for rates and one for spike issues.

clarmso · 2026-02-02T22:30:53Z

api/sentry/api_sentry.py

+                if int(issue.get('filtered', {}).get('userCount', 0)) > threshold
+                if int(issue.get('filtered', {}).get('count', 0)) > threshold
+                if int(issue.get('userCount', 0)) > threshold
+                if int(issue.get('count', 0)) > threshold


For now, I consider all things userCount and count in the issue.

clarmso · 2026-02-02T22:31:48Z

api/sentry/api_sentry.py

+        df_issues = pd.DataFrame()
+        for release_version in release_versions:
+            short_release_version = release_version.split('+')[0]
+            issues = self.sentry_top_new_issues(short_release_version, statsPeriod=3)


Ideally, I would like to set statsPeriod to 1 but due to weekend we may miss some issues.

AaronMT · 2026-02-03T16:00:17Z

Couple questions

spike_issues = [
  issue for issue in issues
  if lifetime.userCount > threshold
  if lifetime.count > threshold
  if filtered.userCount > threshold
  if filtered.count > threshold
  if userCount > threshold
  if count > threshold
]

This is a bit strict in that an issue must exceed the threshold in all 6 fields. In practice, that can easily filter out things you might still consider spikes (e.g., high count but lower userCount, or filtered vs lifetime not both huge).

If the intent is “either users OR count over threshold” the predicate probably wants OR, not AND across everything.

re: new issues, and then applied threshold

is:new issues with >1000 users and >1000 events within 3 days is possible, but it’s a narrow target. Might be fine if you only want “holy crap” regressions, but it’s worth confirming that’s the desired sensitivity with Winnie or whomever?

Also it will keep reporting the same spike issues every time the workflow runs, unless the underlying Sentry query stops returning them. What do we want to do there? One spikes, you might want to cache the values so the alert won't fire if the cache exists, and then clear the cache after 48 hours or whatever. Since this is added to existing workflows, these run multiple times a day so that's potential for repeat alerts.

isabelrios · 2026-02-03T16:26:29Z

I am thinking about new issues... or if we should query issues in general to check if any of those pass the number of Users or Count. If we query new issues we may miss a new issue that starts appearing frequently. Would that be possible?
How often are we going to run this? we may need to have this not only daily but also when an issue has more than 1K users or counts

clarmso · 2026-02-03T17:44:31Z

I am thinking about new issues... or if we should query issues in general to check if any of those pass the number of Users or Count. If we query new issues we may miss a new issue that starts appearing frequently. Would that be possible?

is:new from api_sentry.py restricts the issues to be new ones. In addition, is:unassigned and is:unresolved ensures that the devs haven't assigned them to anyone yet. These parameters cut down on lots of issues to be filtered.

    # API: New top issues
    def sentry_top_new_issues(self, release, statsPeriod=3):
        return self.client.http_get(
            (
                'organizations/{0}/issues/'
                '?project={1}'
                '&query=release.version:{2} is:unassigned is:unresolved is:new'
                '&sort=freq&statsPeriod={3}d'
            ).format(
                self.organization_slug, self.sentry_project_id, release,
                statsPeriod
            )
        )

How often are we going to run this? we may need to have this not only daily but also when an issue has more than 1K users or counts

I'm thinking about running the job daily from Monday to Friday and report only if there's an issue exceeding 1K users or counts.

isabelrios · 2026-02-03T18:11:54Z

I am thinking about new issues... or if we should query issues in general to check if any of those pass the number of Users or Count. If we query new issues we may miss a new issue that starts appearing frequently. Would that be possible?

is:new from api_sentry.py restricts the issues to be new ones. In addition, is:unassigned and is:unresolved ensures that the devs haven't assigned them to anyone yet. These parameters cut down on lots of issues to be filtered.

How often are we going to run this? we may need to have this not only daily but also when an issue has more than 1K users or counts

I'm thinking about running the job daily from Monday to Friday and report only if there's an issue exceeding 1K users or counts.

I am wondering if we may need this to run more often to really detect and alert when there is a spike in an issue pointing to a crash.
About new issues, I am not sure I understand the logic in sentry for new issues.. can a new issue start with several repetitions or those are added as they appear?
We may need a cache mechanism as Aaron mention so that we store the new issues, and we update their count / users for x days and if the go over the limit, alert...

clarmso · 2026-02-03T19:20:08Z

I am wondering if we may need this to run more often to really detect and alert when there is a spike in an issue pointing to a crash.

What is the frequency you'd suggest? I thought the issues would be surfacing within a day.

About new issues, I am not sure I understand the logic in sentry for new issues.. can a new issue start with several repetitions or those are added as they appear? We may need a cache mechanism as Aaron mention so that we store the new issues, and we update their count / users for x days and if the go over the limit, alert...

On the various parameters for is: from Sentry, here are the options and their short description. The options are resolved, unresolved, archived, escalating, new, ongoing, regressed, assigned, unassigned, for_review, linked and unlinked. I deteremined that the combination consisting unresolved, new and unassigned is the most appropriate for our work.

The same issues would increment to the count of the particular issue rather than adding a new entry.

AaronMT · 2026-02-03T20:27:07Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MTE-4764 Querying new crashes that affect many users#292

MTE-4764 Querying new crashes that affect many users#292
clarmso wants to merge 12 commits intomasterfrom
cs/MTE-4764-sentry-new-crashes

clarmso commented Feb 2, 2026 •

edited

Loading

Uh oh!

clarmso Feb 2, 2026

Uh oh!

clarmso Feb 2, 2026

Uh oh!

clarmso Feb 2, 2026

Uh oh!

AaronMT commented Feb 3, 2026 •

edited

Loading

Uh oh!

isabelrios commented Feb 3, 2026

Uh oh!

clarmso commented Feb 3, 2026

Uh oh!

isabelrios commented Feb 3, 2026

Uh oh!

clarmso commented Feb 3, 2026

Uh oh!

AaronMT commented Feb 3, 2026

Uh oh!

clarmso commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clarmso commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clarmso Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

clarmso Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

clarmso Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

AaronMT commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isabelrios commented Feb 3, 2026

Uh oh!

clarmso commented Feb 3, 2026

Uh oh!

isabelrios commented Feb 3, 2026

Uh oh!

clarmso commented Feb 3, 2026

Uh oh!

AaronMT commented Feb 3, 2026

Uh oh!

clarmso commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clarmso commented Feb 2, 2026 •

edited

Loading

AaronMT commented Feb 3, 2026 •

edited

Loading