Add helper to identify / exclude / remove (duplicate) data from unsuccessful job retries

https://github.com/mozilla/OpenWPM/pull/468 introduced job retries so that failed crawl jobs got retried n times. While this made running crawls much smoother, it also introduced data duplication, where the data for a job / crawl record / site visit could be (possibly partially) written but not recorded as successful, hence retried and resulting in a duplicated data for a site visit.

Feel free to close if you feel like it would be better to solve the root cause of the data duplication (related: https://github.com/mozilla/OpenWPM/issues/476) or flag the final crawl_history entries as final instead of adding a helper. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add helper to identify / exclude / remove (duplicate) data from unsuccessful job retries #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add helper to identify / exclude / remove (duplicate) data from unsuccessful job retries #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions