Skip to content

Add helper to identify / exclude / remove (duplicate) data from unsuccessful job retries #13

@motin

Description

@motin

openwpm/OpenWPM#468 introduced job retries so that failed crawl jobs got retried n times. While this made running crawls much smoother, it also introduced data duplication, where the data for a job / crawl record / site visit could be (possibly partially) written but not recorded as successful, hence retried and resulting in a duplicated data for a site visit.

Feel free to close if you feel like it would be better to solve the root cause of the data duplication (related: openwpm/OpenWPM#476) or flag the final crawl_history entries as final instead of adding a helper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions