Prevent util.execute from reading output files#296
Prevent util.execute from reading output files#296anabiman wants to merge 11 commits intoMolSSI:masterfrom
Conversation
|
What about making it like Perhaps add a note that |
|
I like your suggestion. I'll refine this and improve the docstring. |
qcengine/util.py
Outdated
| infiles: Optional[Dict[str, str]] = None, | ||
| outfiles: Optional[List[str]] = None, | ||
| *, | ||
| outfiles_track: Optional[List[str]] = [], |
There was a problem hiding this comment.
Probably better to have None here, like as_binary. https://stackoverflow.com/questions/26320899/why-is-the-empty-dictionary-a-dangerous-default-value-in-python
There was a problem hiding this comment.
Good point but shouldn't matter here since outfiles_track should never be modified, but if you're concerned someone might I could change its type to an immutable object like a Tuple OR if you prefer a list for consistency that is by default None, I could add a single line to disk_files:
outfiles_track = outfiles_track or []that would make the current code work with no additional changes.
qcengine/util.py
Outdated
| except (OSError, FileNotFoundError): | ||
| if "*" in fl: | ||
| filename = lwd / fl | ||
| if fl not in outfiles_track: |
There was a problem hiding this comment.
are you handling the case when outfiles=["subdir/*"], outfiles_track=["subdir/bigfile"]?
There was a problem hiding this comment.
No, this will not work (bigfile will be loaded in memory). For outfiles_track to work, it has to be a subset of outfiles. So for the following input:
{
...
"outfiles": ["*"],
"outfiles_track": ["bigfile"],
...
}or the opposite:
{
...
"outfiles": ["bigfile", ...],
"outfiles_track": ["*"],
...
}outfiles_track will be ignored. There are also no sanity checks being done on outfiles_track so if one specifies a filename not part of outfiles, it will be ignored without any warnings.
I guess the main advantage in setting outfiles_load: bool is we don't have to worry about individual cases like these.
Description
Adds an extra keyword (
outfiles_load) to util.execute that instructs disk_files how to handle output files. Ifoutfiles_load=True(default), the output files are read and their contents returned inoutfiles. Whenoutfiles_load=False, only the file paths are returned instead. This is useful and can be necessary when output files are large so keeping track of the output files while retaining the scratch dir becomes a viable alternative:The default behavior is preserved (
outfiles_load=Trueinutil.execute), so by defaultoutfilesstores the file contents i.e.Changelog description
util.execute supports tracking output files without loading them in memory
Status