Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/release-notes/12122-archiving in sequence.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
This release introduces an additial setting related to archival bag creation, ArchiveOnlyIfEarlierVersionsAreArchived (default false).
If it is true, dataset versions must be archived in order. That is, all prior versions of a dataset must be archived before the latest version can be archived.
This is intended to support use cases where deduplication of files between dataset versions will be done (i.e. by a third-party service running at the archival copy location) and is a step towards supporting the Oxford Common File Layout (OCFL) as an archival format.
8 changes: 8 additions & 0 deletions doc/release-notes/12122-archiving updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## Notifications

This release includes multiple updates to the process of creating archival bags including
- performance/scaling improvements for large datasets (multiple changes)
- bug fixes for when superusers see the "Submit" button to launch archiving from the dataset page version table
- new functionality to optionally suppress an archiving workflow when using the Update Current Version functionality and mark the current archive as out of date
- new functionality to support recreating an archival bag when Update Current Version has been used, which is available for archivers that can delete existing files
-
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
This release contains multiple updates to the OAI-ORE metadata export and archival Bag output:

OAI-ORE
- now uses URI for checksum algorithms
- a bug causing failures with deaccessioned versions when the deaccession note ("Deaccession Reason" in the UI) was null (which has been allowed via the API).
- the "https://schema.org/additionalType" is updated to "Dataverse OREMap Format v1.0.2" to indicate that the out has changed

Archival Bag
- for dataset versions with no files, the (empty) manifest-<alg>.txt file created will now use the default algorithm defined by the "FileFixityChecksumAlgorithm" setting rather than always defaulting to "md5"
- a bug causing the bag-info.txt to not have information on contacts when the dataset version has more than one contact has been fixed
- values used in the bag-info.txt file that may be multi-line (with embedded CR or LF characters) are now properly indented/formatted per the BagIt specification (i.e. Internal-Sender-Identifier, External-Description, Source-Organization, Organization-Address).
- the name of the dataset is no longer used as a subdirectory under the data directory (dataset names can be long enough to cause failures when unzipping)
- a new key, "Dataverse-Bag-Version" has been added to bag-info.txt with a value "1.0", allowing tracking of changes to Dataverse's arhival bag generation
- improvements to file retrieval w.r.t. retries on errors or throttling
This release contains multiple updates to the OAI-ORE metadata export and archival Bag output:

OAI-ORE
- now uses URI for checksum algorithms
- a bug causing failures with deaccessioned versions when the deaccession note ("Deaccession Reason" in the UI) was null (which has been allowed via the API).
- the "https://schema.org/additionalType" is updated to "Dataverse OREMap Format v1.0.2" to indicate that the out has changed

Archival Bag
- for dataset versions with no files, the (empty) manifest-<alg>.txt file created will now use the default algorithm defined by the "FileFixityChecksumAlgorithm" setting rather than always defaulting to "md5"
- a bug causing the bag-info.txt to not have information on contacts when the dataset version has more than one contact has been fixed
- values used in the bag-info.txt file that may be multi-line (with embedded CR or LF characters) are now properly indented/formatted per the BagIt specification (i.e. Internal-Sender-Identifier, External-Description, Source-Organization, Organization-Address).
- the name of the dataset is no longer used as a subdirectory under the data directory (dataset names can be long enough to cause failures when unzipping)
- a new key, "Dataverse-Bag-Version" has been added to bag-info.txt with a value "1.0", allowing tracking of changes to Dataverse's arhival bag generation
- improvements to file retrieval w.r.t. retries on errors or throttling
- retrieval of files for inclusion in the bag is no longer counted as a download by Dataverse
- the size of data files and total dataset size that will be included in an archival bag can now be limited. Admins can choose whether files above these limits are transferred along with the zipped bag (creating a complete archival copy) or are just referenced (using the concept of a "holey" bag and just listing the oversized files and the Dataverse urls from which they can be retrieved. In the holey bag case, an active service on the archiving platform must retrieve the oversized files (using appropriate credentials as needed) to make a complete copy

### New JVM Options (MicroProfile Config Settings)
dataverse.bagit.zip.holey
dataverse.bagit.zip.max-data-size
dataverse.bagit.zip.max-file-size
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/admin/big-data-administration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ There are a broad range of options (that are not turned on by default) for impro
- :ref:`:DisableSolrFacetsWithoutJsession` - disables facets for users who have disabled cookies (e.g. for bots)
- :ref:`:DisableUncheckedTypesFacet` - only disables the facet showing the number of collections, datasets, files matching the query (this facet is potentially less useful than others)
- :ref:`:StoreIngestedTabularFilesWithVarHeaders` - by default, Dataverse stores ingested files without headers and dynamically adds them back at download time. Once this setting is enabled, Dataverse will leave the headers in place (for newly ingested files), reducing the cost of downloads
- :ref:`dataverse.bagit.zip.max-file-size`, :ref:`dataverse.bagit.zip.max-data-size`, and :ref:`dataverse.bagit.zip.holey` - options to control the size and temporary storage requirements when generating archival Bags - see :ref:`BagIt Export`


Scaling Infrastructure
Expand Down
Loading
Loading