-
Notifications
You must be signed in to change notification settings - Fork 14
Description
PyHDX currently only directly accepts data formatted as 'state data' output from DynamX
The issue is a continuation of discussion opened by @tuttlelm at #348:
Related to coming from HDExaminer data (and I can open a separate issue for that topic if that would be more appropriate), pyHDX currently does not allow duplicate measurements when creating the HDXMeasurement object. As far as I can tell, having replicates isn't an issue for any of the downstream calculations, but I wondered if you had thoughts on that. I was able to make some simple modifications to models.py so that I can leave replicates in my data and not have to replicate average it first (basically just data.reset_index() in the init() function and add "index" as a column where you are sorting or pivoting on the columns)
It would be great to add support for other file formats such as HDExaminer data.
A couple of questions:
Why would you prefer to leave the replicates in the data and not average them before entering the HDXMeasurment object? Do you want to perform downstream calculations on each replicate individually?
In the latter the case would it make sense to make one HDXMeasurment object per replicate?
Perhaps you could share your input script or make a pull request with your changes to models.py?
To be honest I think that the current HDXMeasurement object has become a bit of a clumsy thing to work with at the moment. I'm planning to change it in the future (probably in the form of a different project altogether).
There is also the hdxms-datasets package, which is still in a beta phase. Maybe you can also share your thoughts on this. The idea there is that there is a datasets format with a .yaml specification example containing all required metadata such that downstream packages like PyHDX can load data from there directly. Ultimately, it would be nice to add support there for 1) cluster data (replicates) 2) HDExaminer output 3) other formats.
Again, also there currently only DynamX state data is supported, simply because thats the only example data I have at the moment.
Do you have any example datasets of HDExaminer data you can share and/or example scripts of how you load the data?