-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Currently after each crawl, we run data verification using a rather manual process, requiring quite a lot of notebook copying/cloning.
Ideally, it should be enough to run something like crawl_metrics(s3_bucket, crawl_directory) or similar to get relevant metrics, including those from https://github.com/citp/openwpm-data-release/blob/master/Crawl-Data-Metrics.ipynb and those in the notebook linked in openwpm/openwpm-crawler#30 (comment).
A companion crawl_metrics_summary(crawl_metrics) method could be included to print out the most relevant metrics in human-readable form.
Use cases:
- Include at the top of every crawl-analysis notebook to understand the nature of the gathered crawl dataset
- To easily set up notebooks that analyses notebook crawl datasets longitudinally and/or compares individual crawl datasets
- Include in OpenWPM CI to spot regressions in crawl performance/health (related: https://github.com/mozilla/OpenWPM/issues/479)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels