Using Persistent Storage

The storage built into pods and jobs will be deleted when the pod/job are deleted, so for data that you want to stage on Nautilus or for artifacts from compute jobs, you will need to create and attach a persistent storage volume to your pods and jobs.

Creating Persistent Storage

To create persistent storage, you need to make a persistent volume claim, or PVC.

You can use the template present in the repo here. Be sure to change the your_name to the name of the persistent storage volume that you want.

For capacity, 500GB is a good starting point, and keep in mind that it can be increased to larger capacity at a later time if necessary.

Once you have configured your .yml file, you can create the storage by running:

kubectl create -f FILENAME.yml

Using Persistent Storage

Once you have created your PVC, you will need to attach it to your pods and jobs, and update the output paths of your compute processes to those attached volumes.

To do this, you will need to include the volume in your pod/job spec and then specify a path to where you want your data mounted:

spec:
  containers:
    - name: pod-name-sso # YOUR CONTAINER NAME HERE
      volumeMounts:
        - mountPath: /data
          name: persistentVolume-name # YOUR PVC NAME HERE
  volumes:
    - name: persistentVolume-name # YOUR PVC NAME HERE
      persistentVolumeClaim:
        claimName: persistentVolume-name # YOUR PVC NAME HERE

An example kube spec YAML file can be found in the repo here.

Copying Data to Persistent Storage

Now that you have a persistent volume created and know how to use it in a pod or job, you will need to put the data you want to process on that volume.

Open Source ZIP and TAR

For publically available .zip and .tar files, you can use ubuntu pod and install wget or curl to get the data onto your volume.

Accessing Public Cloud Storage

If you are using data stored in pubilc cloud storage buckets, you can use the corresponding command line tool from that cloud vendor to copy your data from the cloud storage bucket to your volume. There are sample Kube YAML for both Google Cloud and AWS

Using Custom Data

If you want to use a custom dataset not readily available in the public domain, the recommended method is to use either the Nautilus provided S3 storage or a personal Google Cloud Bucket.

The steps to use a personal Google Cloud Bucket are as follows:

Create private Google Cloud Bucket (documentation here)
Copy the data from your local machine to the bucket: gsutil cp MYDATA gs://MYBUCKET
Create a pod on the nautilus cluster with the gsutil command line tool and your PVC attacked, as shown in this YAML spec
Open a terminal to your GCP pod: kubectl exec -it MYPOD -- /bin/bash
Login to your Google Account from the pod: gsutil configure
Copy the data from the bucket to your persistent volume: gsutil cp gs://MYBUCKET/MYDATA /MYVOLUME/
Delete the data from the bucket: gsutil rm gs://MYBUCKET/MYDATA

To use the provided Nautilus S3 storage, you can follow the steps outlined on the Using Nautilus S3 wiki page.

Home

Docker

Nautilus Basics

Nautilus Advanced Usage

Jupyter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Persistent Storage

Creating Persistent Storage

Using Persistent Storage

Copying Data to Persistent Storage

Open Source ZIP and TAR

Accessing Public Cloud Storage

Using Custom Data

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally