Finalizing main overhaul steps #103
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR I've added one more important S3 class named bed_data. I also fixed a few bugs that I had introduced in the previous PR. A few (hopefully) inspiring examples that show of the new functionality:
Example 1: maf_data
get_coding_ssm_status will fail to run (by design) when given a MAF object that is from a different genome build than the one specified via the projection parameter:
Example 2: bed_data
We can now more consistently and easily make a bed_data object from bed-like data frames. The output will always have the first three columns with the same naming pattern and will ensure that the chromosome prefixing matches the genome build.
Example 3: bed_data
The create_bed_data function has a few convenience features that minimize the risk that the user will do something wrong (e.g. it ensures unique names in the name column). This means downstream functions can rely on consistency for ease of programming. These objects also know what genome build it belongs to. For objects in GAMBLR.data, this is inferred from the genome build string that is embedded in the variable name. This protects against accidental mixing of region/coordinates and genome builds with other functions. This example code below shows how this was incorporated into get_ssm_by_regions in preparation for cool_overlaps. Now duplicated "names", weirdly ordered columns, chr-prefix mismatches, or the lack of a column named "name" etc can become a thing of the past!