-
Notifications
You must be signed in to change notification settings - Fork 750
Description
Discussed in #2604
Originally posted by samlamont January 29, 2026
Hello!
I am reading netcdf files from s3 and saving locally as geotiff using Sedona and PySpark. If I do not explicitly set the SRID with RS_SetSRID I get an error saying it must be defined when writing to disk. However it seems RS_SetSRID only accepts an integer EPSG code and the netcdf data are in a non-standard CRS with no EPSG code. I do know the proj4/WKT strings of the CRS.
So is there a way I can register the WKT string of the netcdf data to a user-defined EPSG code (ex. 99999) to use with the RS_SetSRID in Sedona?
The netcdf data is the National Water Model v3.0 retrospective rainfall data, ex: https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html#CONUS/netcdf/FORCING/1979/
I'm using PySpark 4.0, Sedona 1.8.0, and the GeoTools extension (org.datasyslab:geotools-wrapper:1.8.0-33.1)
Ex. read:
raster_sdf = (
spark
.read
.format("binaryFile")
.load(s3_glob_pattern)
.selectExpr("RS_FromNetCDF(content, 'RAINRATE', 'x', 'y') as raster", "path as filepath")
.selectExpr("RS_SetSRID(raster, 99999) as raster", "filepath") # <-- can I set 99999 to a custom WKT string?
.selectExpr("RS_AsGeoTiff(raster, 'LZW', 1) as raster", "filepath")
)Then writing to geotiff like:
raster_sdf.write.format("raster").mode("overwrite").partitionBy("value_time").save(f"{output_dir.as_posix()}")Any tips/guidance would be greatly appreciated, thanks!