Skip to content

Conversation

@benjaminmah
Copy link
Contributor

@benjaminmah benjaminmah commented May 2, 2024

Script to generate dataset of bug-inducing commits, backout commits, and the subsequent fix commit.

Intended to include:

  • The hashes of the three commits.
  • Metadata of each commit (pushdate, desc).
  • The diff between the initial commit and the fix commit.

Copy link
Member

@suhaibmujahid suhaibmujahid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @benjaminmah! Please see my comments. Also, please fix the linting errors (you may want to consider installing pre-commit1).

Footnotes

  1. https://github.com/mozilla/bugbug#auto-formatting

def main():
download_databases()

commit_dict, bug_to_commit_dict, bug_dict = preprocess_commits_and_bugs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to consider the space complexity when iterating over the whole dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused keys when constructing the dictionaries and implemented a cache to use generated dictionaries from previous instances of running the code via saving them as JSON files. Let me know if this needs additional changes/fixes!

@benjaminmah benjaminmah requested a review from suhaibmujahid May 7, 2024 14:20
… found, and number of commits with multiple non backed out commits following it
…he dataset, separated by filename and split into `added_lines` and `removed_line`.
@benjaminmah
Copy link
Contributor Author

Example diffs extracted:

Backout Data Collection Validation

@benjaminmah benjaminmah requested a review from suhaibmujahid May 22, 2024 19:18
@benjaminmah benjaminmah requested a review from suhaibmujahid June 3, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants