Skip to content

OpenDAP access for password protected files #20

@ashiklom

Description

@ashiklom

OpenDAP is a great way to remotely access large NetCDF datasets.

The current version of RNetCDF works with OpenDAP links for un-authenticated URLs. For example, for GHCN CAMS data from the IRI Data Library, this just works:

library(RNetCDF)
# Example URL to IRI Data Library GHCN_CAMS data 
url <- "http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.GHCN_CAMS/.gridded/.deg0p5/.temp/dods"
nc <- open.nc(url)
print.nc(nc)
close.nc(nc)

However, some URLs, including those of the UCAR Research Data Archive (RDA) or the NASA GES DISC are password protected. Currently, my approach for accessing these services is to manually build the subset URL, then download the resulting NetCDF file and read it from disc. For example, for some MERRA outputs:

# Base URL for MERRA at this date
url_base <- "https://goldsmr4.gesdisc.eosdis.nasa.gov:443/opendap/MERRA2/M2I1NXASM.5.12.4/1983/08/MERRA2_100.inst1_2d_asm_Nx.19830801.nc4"
# Add NCDF4 output tail (not a typo -- the ending is `.nc4.nc4`)
url_nc4 <- paste0(url_base, ".nc4")
# Add the subset query for wind U component (variable U2M)
url <- paste0(url_nc4, "?U2M[0:1:23][0:1:1][0:1:1]")

tmp <- tempfile()

library(RNetCDF)
library(curl)
library(magrittr)

h <- new_handle() %>%
  handle_setopt(
    followlocation = TRUE,
    username = my_user,
    password = my_pass
  )

curl_download(url, tmp, handle = h)

nc <- open.nc(tmp)
print.nc(nc)
close.nc(nc)

...or, using the crul package...

library(crul)
http <- HttpClient$new(
  url = url,
  auth = auth(user = my_user, pwd = my_pass)
)
result <- http$get()
tmp2 <- tempfile()
writeBin(result$content, tmp2)
nc <- RNetCDF::open.nc(tmp2)

However, it would be great if there was a way to more directly access the NetCDF file and subset it through R in a way analogous to the non-password-protected services like the first example.

The underlying issue here is that RNetCDF::open.nc only supports simple URLs as connections. However, if it could be modified to work with full HTTP requests generated by curl or crul, I think everything else should work the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions