MTE-4764 Querying new crashes that affect many users#292
MTE-4764 Querying new crashes that affect many users#292
Conversation
| if args.type == 'spike-issues': | ||
| main_spike_issues(args.file, args.project) | ||
| else: | ||
| main_rates(args.file, args.project) |
There was a problem hiding this comment.
I'm considering splitting this file into two: one for rates and one for spike issues.
api/sentry/api_sentry.py
Outdated
| if int(issue.get('filtered', {}).get('userCount', 0)) > threshold | ||
| if int(issue.get('filtered', {}).get('count', 0)) > threshold | ||
| if int(issue.get('userCount', 0)) > threshold | ||
| if int(issue.get('count', 0)) > threshold |
There was a problem hiding this comment.
For now, I consider all things userCount and count in the issue.
| df_issues = pd.DataFrame() | ||
| for release_version in release_versions: | ||
| short_release_version = release_version.split('+')[0] | ||
| issues = self.sentry_top_new_issues(short_release_version, statsPeriod=3) |
There was a problem hiding this comment.
Ideally, I would like to set statsPeriod to 1 but due to weekend we may miss some issues.
|
Couple questions spike_issues = [
issue for issue in issues
if lifetime.userCount > threshold
if lifetime.count > threshold
if filtered.userCount > threshold
if filtered.count > threshold
if userCount > threshold
if count > threshold
]This is a bit strict in that an issue must exceed the threshold in all 6 fields. In practice, that can easily filter out things you might still consider spikes (e.g., high count but lower userCount, or filtered vs lifetime not both huge). If the intent is “either users OR count over threshold” the predicate probably wants OR, not AND across everything. re: new issues, and then applied threshold
Also it will keep reporting the same spike issues every time the workflow runs, unless the underlying Sentry query stops returning them. What do we want to do there? One spikes, you might want to cache the values so the alert won't fire if the cache exists, and then clear the cache after 48 hours or whatever. Since this is added to existing workflows, these run multiple times a day so that's potential for repeat alerts. |
|
I am thinking about new issues... or if we should query issues in general to check if any of those pass the number of Users or Count. If we query new issues we may miss a new issue that starts appearing frequently. Would that be possible? |
I'm thinking about running the job daily from Monday to Friday and report only if there's an issue exceeding 1K users or counts. |
I am wondering if we may need this to run more often to really detect and alert when there is a spike in an issue pointing to a crash. |
|
Other questions:
Nits:
|




This PR queries Sentry to see if there are any new
issuesthat affect more than 1000 users or happen more than 1000 times during the last 3 days (due to weekend). If there are spikes, we send a slack notification. If there are no spikes, we do not send any notifications.Here's a mockup of the Slack notification if new crashes are found:

I would like to run the query on preflight and staging.