-
Notifications
You must be signed in to change notification settings - Fork 534
Description
This feature request comes out of the discussion on data curation at the 2021 DV community meeting:
Current behavior
When an ingestable tabular file is deposited (.xlsx, .sav, .dta), the default download format (and the displayed file extension) is the ingested .tab version of the file. The original file format is available from the File access menu together with file-level metadata and the explorer tools
Suggested behavior
I suggest that the deposited file format is better suited as the default download format, with .tab (or .tsv as it should be called ;)) being available through the File access menu
Rationale
There are several reasons deposited file formats are preferable:
- The default display of the ingested file is confusing for depositors, as @amberleahey noted during the discussion.
- Frequently, deposit format are richer than the extracted .tab. E.g., Excel files may have additional rich text formatting, which makes them easier to ready than their plain text counterparts
- In some cases, ingest can cause data loss (e.g. for Excel files with multiple tabs, undesirable as those may be). Defaulting to the deposited format somewhat mitigates this, even though it is still problematic.
On a more theoretical level, in the terminology of the OAIS reference mode, we clearly have the SIP (the deposited file) and the AIP (the archived/preservation copy) defined and the question is which of the two is the better DIP. I would argue that is the more commonly useable and often richer data format -- that not just the case for Excel, but also for things like .sav files which include rich metadata that reads nicely not just into SPSS but also into tools like R with appropriate packages.
cc @sbarbosadataverse who was also part of this dicussion
Metadata
Metadata
Assignees
Labels
Type
Projects
Status