Skip to content

Display deposited (rather than ingested) copy of tabular files  #7956

@adam3smith

Description

@adam3smith

This feature request comes out of the discussion on data curation at the 2021 DV community meeting:

Current behavior
When an ingestable tabular file is deposited (.xlsx, .sav, .dta), the default download format (and the displayed file extension) is the ingested .tab version of the file. The original file format is available from the File access menu together with file-level metadata and the explorer tools

Suggested behavior
I suggest that the deposited file format is better suited as the default download format, with .tab (or .tsv as it should be called ;)) being available through the File access menu

Rationale
There are several reasons deposited file formats are preferable:

  1. The default display of the ingested file is confusing for depositors, as @amberleahey noted during the discussion.
  2. Frequently, deposit format are richer than the extracted .tab. E.g., Excel files may have additional rich text formatting, which makes them easier to ready than their plain text counterparts
  3. In some cases, ingest can cause data loss (e.g. for Excel files with multiple tabs, undesirable as those may be). Defaulting to the deposited format somewhat mitigates this, even though it is still problematic.

On a more theoretical level, in the terminology of the OAIS reference mode, we clearly have the SIP (the deposited file) and the AIP (the archived/preservation copy) defined and the question is which of the two is the better DIP. I would argue that is the more commonly useable and often richer data format -- that not just the case for Excel, but also for things like .sav files which include rich metadata that reads nicely not just into SPSS but also into tools like R with appropriate packages.

cc @sbarbosadataverse who was also part of this dicussion

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Implemented at QDR

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions