Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ formats:
- pdf

build:
os: ubuntu-22.04
os: ubuntu-24.04
tools:
python: "3.10"
python: "3.12"
apt_packages:
- graphviz

Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,6 @@ sphinx-tabs==3.4.5
sphinxcontrib-jquery

Sphinx-Substitution-Extensions==2025.1.2
semver>=3,<4
semver>=3,<4

sphinx-reredirects==1.1.0
10 changes: 10 additions & 0 deletions doc/sphinx-guides/source/_static/installation/cors/cors.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"CORSRules": [
{
"AllowedOrigins": ["*"],
"AllowedHeaders": ["*"],
"AllowedMethods": ["PUT", "GET"],
"ExposeHeaders": ["ETag", "Accept-Ranges", "Content-Encoding", "Content-Range"]
}
]
}
13 changes: 13 additions & 0 deletions doc/sphinx-guides/source/_static/installation/cors/cors.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedHeader>*</AllowedHeader>
<AllowedMethod>PUT</AllowedMethod>
<AllowedMethod>GET</AllowedMethod>
<ExposeHeader>ETag</ExposeHeader>
<ExposeHeader>Accept-Ranges</ExposeHeader>
<ExposeHeader>Content-Encoding</ExposeHeader>
<ExposeHeader>Content-Range</ExposeHeader>
</CORSRule>
</CORSConfiguration>
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/admin/big-data-administration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ Benefits: S3 offers several advantages over file storage:

Challenges:

- One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with DVWebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store.
- One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with DVWebloader (:ref:`folder-upload`) is to allow :ref:`cross site (CORS) requests on your S3 store <cors-s3-bucket>`.
- Cost: S3 offers a pricing model that allows you to pay for the storage and transfer of data based on current usage (versus long term demand) but commercial
providers charge more per TB than the equivalent cost of a local disk (though commercial S3 storage is cheaper than commercial file storage).
There can also be egress and other charges. Overall, S3 storage is generally more expensive than local file storage but cheaper than cloud file storage.
Expand Down
8 changes: 6 additions & 2 deletions doc/sphinx-guides/source/api/external-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,13 @@ Introduction
External tools are additional applications the user can access or open from your Dataverse installation to preview, explore, and manipulate data files and datasets. The term "external" is used to indicate that the tool is not part of the main Dataverse Software.

.. note::
Browser-based tools must have CORS explicitly enabled via :ref:`dataverse.cors.origin <dataverse.cors.origin>`. List every origin that will host your tool (or use ``*`` when a wildcard is acceptable). If an origin is not listed, the browser will block that tool's API requests even if the tool page itself loads.
Browser-based tools require CORS explicitly enabled in Dataverse. See :ref:`dataverse.cors` for details.

Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate. For example, if you've deployed your tool to fabulousfiletool.com your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org
List every origin that will host your tool (or use ``*`` when a wildcard is acceptable and no authentication is required).
If an origin is not listed, the browser will block that tool's API requests even if the tool page itself loads.

Once you have created the external tool itself (which is most of the work!), you need to teach a Dataverse installation how to construct URLs that your tool needs to operate.
For example, if you've deployed your tool to *fabulousfiletool.com* your tool might want the ID of a file and the siteUrl of the Dataverse installation like this: *https://fabulousfiletool.com?fileId=42&siteUrl=https://demo.dataverse.org*

In short, you will be creating a manifest in JSON format that describes not only how to construct URLs for your tool, but also what types of files your tool operates on, where it should appear in the Dataverse installation web interfaces, etc.

Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -207,11 +207,11 @@ Please note that some APIs are only documented in other guides that are more sui
- Installation Guide

- :doc:`/installation/config`
- :doc:`/installation/big-data-support`

- Developer Guide

- :doc:`/developers/aux-file-support`
- :doc:`/developers/big-data-support`
- :doc:`/developers/dataset-migration-api`
- :doc:`/developers/dataset-semantic-metadata-api`
- :doc:`/developers/s3-direct-upload-api`
Expand Down
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
'myst_parser',
'sphinx_tabs.tabs',
'sphinx_substitution_extensions',
'sphinx_reredirects',
]

# Add any paths that contain templates here, relative to this directory.
Expand Down Expand Up @@ -78,6 +79,12 @@
# for a list of supported languages.
language = 'en'

# Redirects for pages that have been moved
# See https://documatt.com/sphinx-reredirects/usage for detailed information
redirects = {
'developers/big-data-support': '../installation/big-data-support.html',
}

# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
Expand Down
1 change: 0 additions & 1 deletion doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ Developer Guide
remote-users
geospatial
selinux
big-data-support
aux-file-support
s3-direct-upload-api
globus-api
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,47 +49,74 @@ The following features are disabled when S3 direct upload is enabled.
- Creation of NcML auxiliary files (See :ref:`netcdf-and-hdf5`.)
- Extraction of a geospatial bounding box from NetCDF and HDF5 files (see :ref:`netcdf-and-hdf5`) unless :ref:`dataverse.netcdf.geo-extract-s3-direct-upload` is set to true.


.. _cors-s3-bucket:

Allow CORS for S3 Buckets
~~~~~~~~~~~~~~~~~~~~~~~~~

**IMPORTANT:** One additional step that is required to enable direct uploads via a Dataverse installation and for direct download to work with previewers and direct upload to work with dvwebloader (:ref:`folder-upload`) is to allow cross site (CORS) requests on your S3 store.
The example below shows how to enable CORS rules (to support upload and download) on a bucket using the AWS CLI command line tool. Note that you may want to limit the AllowedOrigins and/or AllowedHeaders further. https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3 has some additional information about doing this.
**IMPORTANT:** This additional step of allowing cross-site request to your S3 buckets is required to enable direct uploads via a Dataverse installation, direct download to work with previewers, or direct upload to work with *dvwebloader* (:ref:`folder-upload`).

To successfully enable direct uploads (e.g. :ref:`folder-upload`) or direct downloads (e. g. consumed by previewers), you must both:
* Enable CORS in Dataverse (see :ref:`dataverse.cors`).
* Configure a matching/compatible CORS policy on each S3 bucket (and any CDN/proxy in front of it) that will be used.

Dataverse itself will only emit the necessary ``Access-Control-*`` headers to browsers when CORS has been explicitly enabled via the JVM/MicroProfile setting :ref:`dataverse.cors.origin <dataverse.cors.origin>`. You must both:
**NOTE:** Make sure the bucket's CORS configuration ``AllowedOrigins`` is at least as permissive as the origins you configure in :ref:`dataverse.cors.origin`.
If the bucket allows the wildcard ``*`` but the Dataverse application only allows a subset, the browser will still enforce the more restrictive application response!

* Configure an appropriate ``dataverse.cors.origin`` value (single origin, comma-separated list, or ``*``) on the Dataverse application server; and
* Configure a matching/compatible CORS policy on each S3 bucket (and any CDN/proxy in front of it) that will be used for direct upload or for redirect (download-redirect) operations consumed by previewers.
Detailed information for the most common S3 admin tools around CORS:

If you specify multiple origins in ``dataverse.cors.origin`` Dataverse will echo back the requesting origin (when it matches) and will include ``Vary: Origin`` so that shared caches do not serve one origin's response to another. If you configure ``*`` Dataverse will respond with ``Access-Control-Allow-Origin: *`` (note that browsers will not allow credentialed requests with a wildcard).
- `AWS <https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html>`_
- `Minio mc <https://docs.min.io/enterprise/aistor-object-store/reference/cli/mc-cors>`_
- `s3cmd <https://servicedesk.surf.nl/wiki/spaces/WIKI/pages/215253125/Object+Store+S3+CORS+Policies>`_

Make sure the bucket CORS configuration ``AllowedOrigins`` is at least as permissive as the origins you configure in ``dataverse.cors.origin``. If the bucket allows ``*`` but the Dataverse application only allows a subset, the browser will still enforce the more restrictive application response.
Get Current CORS Policy on Bucket
+++++++++++++++++++++++++++++++++

If you'd like to check the CORS configuration on your bucket before making changes:

``aws s3api get-bucket-cors --bucket <BUCKET_NAME>``
.. tabs::
.. group-tab:: AWS CLI
:code:`aws s3api get-bucket-cors --bucket <BUCKET_NAME>`

.. group-tab:: Minio Client (mc)
:code:`mc cors get <STORE_NAME>/<BUCKET_NAME>`

Set CORS Policy on Bucket
+++++++++++++++++++++++++

The examples below shows how to enable CORS rules (to support upload and download) on a bucket.

**Note:** You may want to limit the ``AllowedOrigins`` and/or ``AllowedHeaders`` further.
`GDCC/dataverse-previewers <https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3>`_ has some additional information about doing this.

Both JSON and XML format are explained in detail in `AWS Docs <https://docs.aws.amazon.com/AmazonS3/latest/userguide/ManageCorsUsing.html#cors-example-1>`_.

.. tabs::
.. group-tab:: AWS CLI
Create a file :download:`cors.json </_static/installation/cors/cors.json>` as follows:

.. literalinclude:: /_static/installation/cors/cors.json
:name: aws-cors
:language: json

Proceed with making the changes:

:code:`aws s3api put-bucket-cors --bucket <BUCKET_NAME> --cors-configuration file://cors.json`

To proceed with making changes:
Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above.

``aws s3api put-bucket-cors --bucket <BUCKET_NAME> --cors-configuration file://cors.json``
.. group-tab:: Minio Client (mc)
Create a file :download:`cors.xml </_static/installation/cors/cors.xml>` as follows:

with the contents of the file cors.json as follows:
.. literalinclude:: /_static/installation/cors/cors.xml
:name: xml-cors
:language: xml

.. code-block:: json
Proceed with making the changes:

{
"CORSRules": [
{
"AllowedOrigins": ["*"],
"AllowedHeaders": ["*"],
"AllowedMethods": ["PUT", "GET"],
"ExposeHeaders": ["ETag", "Accept-Ranges", "Content-Encoding", "Content-Range"]
}
]
}
:code:`mc cors set <STORE_NAME>/<BUCKET_NAME> ./cors.xml`

Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above.

.. _s3-tags-and-direct-upload:

Expand Down
Loading