Skip to content

Conversation

@lmezilis
Copy link

@lmezilis lmezilis commented Nov 5, 2025

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

@euronion
Copy link
Collaborator

euronion commented Nov 6, 2025

Thanks a lot for this nice feature!

Can you let us know when you are ready for us to review it?
It would also be nice to have a short example, e.g. a ipynb to include into the documentation.

@brynpickering Can I ask you to review it if you have capacity? Thanks!

@lmezilis
Copy link
Author

lmezilis commented Nov 6, 2025

Thank you!

I think we can review it right away, and I can prepare some files for documentation. Apologies for being very new to github procedures. I am slowly starting to get the hang of it.

@euronion
Copy link
Collaborator

euronion commented Nov 6, 2025

[...] Apologies for being very new to github procedures. I am slowly starting to get the hang of it.

No worries, good that you mention it! We'll help you get settled in and don't hesitate to ask questions if something is unclear or to ping us (e.g. using @euronion or @brynpickering).

Copy link
Contributor

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR @lmezilis ! As @euronion has mentioned, some documentation is needed. My comments are about implementation. I have no comments on the method, I'm sure you've got that right! There's lots of scope to clean things up, though, which will help us maintain the feature in future.

If you want any clarification on comments/suggestions I've made, just reply directly on the comment. If you are happy with a suggestion, you can just accept it and it will automatically update the code for you. You can always accept a suggestion but then make edits to it later.

One thing missing from the new files is a REUSE header. You can find equivalents in other files. The easiest is to just copy the one across from era5.py. Then, make sure your name is in AUTHORS.rst so you count as one of the "Contributors to atlite".

features = {
"height": ["height"],
"wind": ["wnd100m", "wnd_shear_exp", "wnd_azimuth", "roughness"],
"wind": ["wnd100m", "wnd_azimuth", "roughness"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably didn't mean to delete "wnd_shear_exp". I assume this is needed by other atlite methods.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was working with an older version of atlite, this was accidental.

Comment on lines 58 to 59
"wave_height": ["wave_height"],
"wave_period": ["wave_period"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collapse this into a wave: ["wave_height", "wave_period"] option. Then users can request the wave feature and get both of these variables. It's unlikely they'd ever want one but not the other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, thank you.

Comment on lines 22 to 23
Optionally (add_lon_lat, default:True) preserves latitude and
longitude columns as 'lat' and 'lon'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mention this option in the docstring but it isn't in the method signature. Do you want to include this option or not?

Copy link
Author

@lmezilis lmezilis Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this option was not necessary for the mrel wave files. I eventually renamed the dimentions in the last part of the script. I would say that the function "_rename_and_clean_coords" can be deleted here. do you think we should include this option?

EDIT: I saw the rest of the comments now, I will include the function and try to make it as similar to era5 as possible.

Comment on lines 34 to 46
def get_data_wave_height(ds):
ds = ds.rename({"hs": "wave_height"})
ds["wave_height"] = ds["wave_height"].clip(min=0.0)

return ds


def get_data_wave_period(ds):
ds = ds.rename({"tp": "wave_period"})
# ds["wave_period"] = (1 / ds["wave_period"])
ds["wave_period"] = ds["wave_period"].clip(min=0.0)

return ds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the process the same as in era5.py - split this into data retrieval and then sanitisation and call the sanitisation function only if requested in get_data (you can copy most of the functionality directly over from era5.py)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: you don't actually need these get_data_... methods in this module, but having sanitize_... methods would be good. They can then be called iteratively in get_data as is done in era5.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct, I was working with these files a long time ago, figuring out how which functions I need, will correct this!


def get_data_wave_period(ds):
ds = ds.rename({"tp": "wave_period"})
# ds["wave_period"] = (1 / ds["wave_period"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always better to rely on version history to recover lines of code you no longer need, rather than commenting them out. So, feel free to delete all your commented out lines!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry for this, I thought I did that for every file but forgot this one.

Comment on lines 658 to 659
def convert_wave(ds, wec_type):
power_matrix = pd.DataFrame.from_dict(wec_type["Power_Matrix"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should have a docstring here

"""
Generate wave generation time series
evaluates the significant wave height (Hs) and wave peak period (Tp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Hs and Tp are MREL-specific and wouldn't make sense if using ERA5 data, it might make more sense to not reference them by these acronyms in this file.

# Ignore IDE project files
.idea/
.vscode
.vs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally best to add these pointers to your own "global" gitignore, rather than to every project you work on. That way, it never accidentally slips in without you realising it!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CERRA is available via the Climate Data Store, we should follow the same approach to data retrieval as with ERA5. You could probably just copy era5.py directly and delete all but the wind getter method, then adapt the wind getter method to match the data available from CERRA.

It might be best to drop this from this PR, though, and bring it in separately later. I can see a benefit to changing the features we bring in for wind since we can retrieve wind speed at various height/pressure levels from CDS, which would allow us to create a wind vertical profile (as we do with ERA5 data).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with CERRA here is that there is a lot of preprocessing of data in order for atlite to be able to read it. I agree that it is best to review this later. Should I take an action on this or can you simply reject this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest is for you to simply delete this file (and reference to cerra elsewhere). If we want to revisit it, this file will still be in the commit history so we can bring it back if we want anything from it!

@lmezilis
Copy link
Author

lmezilis commented Nov 7, 2025

Hello @brynpickering, I have made all of the changes locally, should I commit changes in the forked branch or is there another way to continue?

@brynpickering
Copy link
Contributor

Hello @brynpickering, I have made all of the changes locally, should I commit changes in the forked branch or is there another way to continue?

Yes, in the forked branch. You should be able to just always work in the forked branch and push to your own repository ("origin") whenever you make changes. Those changes will then be made visible in this PR

Copy link
Contributor

@brynpickering brynpickering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @lmezilis. Just a couple of extra comments.

I still need to check the processing method, which I'll do now. In the meantime, it would be great to add some documentation. The easiest would be to add a jupyter notebook to examples and then link that into the docs by adding an entry into doc/examples (following the example of the other entries). The example notebook could load from era5 and mrel separately and maybe compare the results for a specific gridcell?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest is for you to simply delete this file (and reference to cerra elsewhere). If we want to revisit it, this file will still be in the commit history so we can bring it back if we want anything from it!

"""
Rename and sanitize retrieved wave height data.
"""
ds = ds.rename({"hs": "wave_height"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, this renaming is happening in the main function now (rename(features)).

"""
Rename and sanitize retrieved wave height data.
"""
ds = ds.rename({"tp": "wave_period"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, this renaming is happening in the main function now (rename(features)).

@lmezilis
Copy link
Author

the variable tp doesn't exist; only t01 and t02 exist linked to wave period data (both being mean wave periods). How did you get tp? From a different version of this dataset?

@brynpickering I must have changed the variables to be more convenient to work back in the day and completely forgot their original names. t01 is the one that should be used. That dataset has a lot of different types of ocean variables.

In the meantime, it would be great to add some documentation.

of course, sorry for the delays, I am working on this, I have my documentation ready but trying to make it apealing and clear. probably be ready by tomorrow

compare the results for a specific gridcell?

I assume between ERA5 and ECHOWAVE, correct?

@lmezilis
Copy link
Author

One question: The naming "wecgenerator" strikes me a bit odd.

@euronion you are right, a simple wec should do it. I will update this. Also in pypsa-eur I used the term wec_type as a generator index in build_renewables. I notice that wind turbines are called turbines. Should I just type wec there too?

@euronion
Copy link
Collaborator

One question: The naming "wecgenerator" strikes me a bit odd.

@euronion you are right, a simple wec should do it. I will update this. Also in pypsa-eur I used the term wec_type as a generator index in build_renewables. I notice that wind turbines are called turbines. Should I just type wec there too?

Yes, wec or even just converter instead of wec_type in this case is fine, because the context and the hierarchy in the config making it clear what it is about (solar -> panel, wind -> turbine, wave -> wec/converter).

Comment on lines 658 to 727
def convert_wave(ds, wec):
r"""
Convert wave height (Hs) and wave peak period (Tp) data into normalized power output
using the device-specific Wave Energy Converter (WEC) power matrix.
This function matches each combination of significant wave height and peak period
in the dataset to a corresponding power output from the WEC power matrix.
The resulting power output is normalized by the maximum possible output (capacity)
to obtain the specific generation profile.
Parameters
----------
ds : xarray.Dataset
Input dataset (cutout) containing two variables:
wave_height: significant wave height (m)
wave_period: peak wave period (s)
wec_type : dict
Dictionary defining the WEC characteristics, including:
Power_Matrix: a power matrix dictionary stored in "resources\wecgenerator"
Returns
-------
xarray.DataArray
DataArray of specific power generation values (normalized power output).
Notes
-----
A progress message is printed every one million cases to track computation.
"""

power_matrix = pd.DataFrame.from_dict(wec["Power_Matrix"])
max_pow = power_matrix.to_numpy().max()

Hs = np.ceil(ds["wave_height"] * 2) / 2
Tp = np.ceil(ds["wave_period"] * 2) / 2

Hs_list = Hs.to_numpy().flatten().tolist()
Tp_list = Tp.to_numpy().flatten().tolist()

# empty list for result
power_list = []
cases = len(Hs_list)
count = 0

# for loop to loop through Hs and Tp pairs and get the power output and capacity factor
for Hs_ind, Tp_ind in zip(Hs_list, Tp_list):
if count % 1000000 == 0:
print(f"Case {count} of {cases}: %")
if np.isnan(Hs_ind) or np.isnan(Tp_ind):
power_list.append(0)
elif Hs_ind > 10 or Tp_ind > 18:
power_list.append(0)
else:
generated_power = power_matrix.loc[Hs_ind, Tp_ind]
power_list.append(generated_power / max_pow)
count += 1

# results list to numpy array
power_list_np = np.array(power_list)

power_list_np = power_list_np.reshape(Hs.shape)

da = xr.DataArray(
power_list_np, coords=Hs.coords, dims=Hs.dims, name="Power generated"
)
da.attrs["units"] = "kWh/kWp"
da = da.rename("specific generation")
da = da.fillna(0)

return da
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I had the chance to test this.

This suggestion vectorises the approach, chunking on the time dimension to reduce the amount of data that is processed in memory at any point. This should be much quicker than your current approach and the time chunk size can be relaxed when running on remote machines with large memory availability. You can run even on a local machine by using small time chunks.

Give it a go with data you have and let me know how it goes!

Suggested change
def convert_wave(ds, wec):
r"""
Convert wave height (Hs) and wave peak period (Tp) data into normalized power output
using the device-specific Wave Energy Converter (WEC) power matrix.
This function matches each combination of significant wave height and peak period
in the dataset to a corresponding power output from the WEC power matrix.
The resulting power output is normalized by the maximum possible output (capacity)
to obtain the specific generation profile.
Parameters
----------
ds : xarray.Dataset
Input dataset (cutout) containing two variables:
wave_height: significant wave height (m)
wave_period: peak wave period (s)
wec_type : dict
Dictionary defining the WEC characteristics, including:
Power_Matrix: a power matrix dictionary stored in "resources\wecgenerator"
Returns
-------
xarray.DataArray
DataArray of specific power generation values (normalized power output).
Notes
-----
A progress message is printed every one million cases to track computation.
"""
power_matrix = pd.DataFrame.from_dict(wec["Power_Matrix"])
max_pow = power_matrix.to_numpy().max()
Hs = np.ceil(ds["wave_height"] * 2) / 2
Tp = np.ceil(ds["wave_period"] * 2) / 2
Hs_list = Hs.to_numpy().flatten().tolist()
Tp_list = Tp.to_numpy().flatten().tolist()
# empty list for result
power_list = []
cases = len(Hs_list)
count = 0
# for loop to loop through Hs and Tp pairs and get the power output and capacity factor
for Hs_ind, Tp_ind in zip(Hs_list, Tp_list):
if count % 1000000 == 0:
print(f"Case {count} of {cases}: %")
if np.isnan(Hs_ind) or np.isnan(Tp_ind):
power_list.append(0)
elif Hs_ind > 10 or Tp_ind > 18:
power_list.append(0)
else:
generated_power = power_matrix.loc[Hs_ind, Tp_ind]
power_list.append(generated_power / max_pow)
count += 1
# results list to numpy array
power_list_np = np.array(power_list)
power_list_np = power_list_np.reshape(Hs.shape)
da = xr.DataArray(
power_list_np, coords=Hs.coords, dims=Hs.dims, name="Power generated"
)
da.attrs["units"] = "kWh/kWp"
da = da.rename("specific generation")
da = da.fillna(0)
return da
# wave
def convert_wave(ds, wec_type, time_chunk_size: int = 100) -> xr.DataArray:
r"""
Convert wave height (Hs) and wave peak period (Tp) data into normalized power output
using the device-specific Wave Energy Converter (WEC) power matrix.
This function matches each combination of significant wave height and peak period
in the dataset to a corresponding power output from the WEC power matrix.
The resulting power output is normalized by the maximum possible output (capacity)
to obtain the specific generation profile.
Parameters
----------
ds : xarray.Dataset
Input dataset (cutout) containing two variables:
wave_height: significant wave height (m)
wave_period: peak wave period (s)
wec_type : dict
Dictionary defining the WEC characteristics, including:
Power_Matrix: a power matrix dictionary stored in "resources\wecgenerator"
time_chunk_size : int
Size of time chunks for processing large datasets, to limit memory spikes. Default is 100.
Returns
-------
xarray.DataArray
DataArray of specific power generation values (normalized power output).
Notes
-----
A progress message is printed every one million cases to track computation.
"""
power_matrix = (
pd.DataFrame.from_dict(wec_type["Power_Matrix"])
.stack()
.rename_axis(index=["wave_height", "wave_period"])
.where(lambda x: x > 0)
.dropna()
.to_xarray()
)
results = []
steps = np.arange(0, len(ds.time), step=100)
for step in tqdm(steps, desc="Processing wave data chunks", total=len(steps), unit="time chunk"):
ds_ = ds.isel(time=slice(step, step + time_chunk_size))
cf = power_matrix.interp(
{"wave_height": ds_.wave_height, "wave_period": ds_.wave_period},
method="nearest",
)
results.append(cf)
da = xr.concat(results, dim="time")
da.attrs["units"] = "kWh/kWp"
da = da.rename("specific generation")
da = da.fillna(0)
return da

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok perfect, thank you very much I will test it.

I need to make a correction on the variables used. I am going through my processes and I see that I used an extrapolated tp which is the 1/fp where fp: wave peak frequency. I did this a long time ago and kind of forgot it. I used to have this conversion in the mrel_wave.py module, but it was more convenient for me back then to replace fp with tp in the dataset and test cutout creation like that. Should I include this conversion in the module?

t01 and t02 can be used aswel, and the results will be more or less the same (approx 2% off).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to use either, depending on which you think is most appropriate. For comparability with the output of era5.py, if you use fp then you should convert it to wave_period (1/fp) directly in mrel.py

@lmezilis
Copy link
Author

lmezilis commented Nov 18, 2025

I have tried to implement the new convert functions and I think they work fine. However it seems like there are issues with the downloaded data, when I try to create the cutout, probably a problem of the OPENDAP server. I have an invalid ID error:

Exception ignored in: <function CachingFileManager.__del__ at 0x000001F1C57E0E00>
Traceback (most recent call last):
  File "c:\Users\thira\anaconda3\envs\pypsa-eur\Lib\site-packages\xarray\backends\file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "c:\Users\thira\anaconda3\envs\pypsa-eur\Lib\site-packages\xarray\backends\file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID
C:\Users\thira\Desktop\atlite-mrel\atlite\data.py:249: UserWarning: The specified chunks separate the stored chunks along dimension "time" starting at index 100. This could degrade performance. Instead, consider rechunking after loading.
  cutout.data = xr.open_dataset(cutout.path, chunks=cutout.chunks)

Even with this error the cutout is seemingly finnished, but something is not passed correctly, as when I slice the cutout like so time=slice("2018-01-01", "2018-01-08"), I still get the entire month of cutout, let's say TU-MREL_EU_ATL-2M_201801.nc.

When I set it until February (time=slice("2018-01-01", "2018-02-08")) then the cutout completes with the last timestamps having empty grids. I don't know if this is a cutout.py related issue but I cannot create a cutout smaller than 1 month.

Apart from that the new code looks to be working.

@lmezilis
Copy link
Author

I have also tried to automate the cutout process in case the data are not already downloaded. It seems that there are permision issues there, as I can remotely load the dataset but i cannot load any of the variables, it has to be downloaded. So what I did is a function to create the urls and a another one to download and merge them.

I have to be honest it doesn't look ideal, and also I am not sure how to use the temp directories to save the downloaded files and load them from there. I will upload an example without this feature.

@brynpickering
Copy link
Contributor

@lmezilis could you point me to the source of the datasets you're using? I used ones directly from the TUDelft OpenDAP but they may be slightly different to the one you have already downloaded.

@lmezilis
Copy link
Author

lmezilis commented Nov 18, 2025

I am working with the source that you mentioned above: OPeNDAP. The same dataset was used for our calculations.

You can see the code below for how I obtained the urls after the cutout parameters were set:

time_index = cutout.coords["time"].to_index()

urls = []

for year in time_index.year.unique():
    year_times = time_index[time_index.year == year]
    months = year_times.month.unique()

    # Limit months in the final year
    if year == time_index[-1].year:
        last_month = time_index[-1].month
        months = months[months <= last_month]

    for month in months:
        url = (
            "https://opendap.4tu.nl/thredds/dodsC/data2/djht/f359cd0f-d135-416c-9118-e79dccba57b9/1/"
            f"{year}/TU-MREL_EU_ATL-2M_{year}{month:02}.nc",
        )
        urls.append((year, month, url))

@lmezilis
Copy link
Author

I made all of these similar commits because there are some things that I need to change in the syntax, but the pre-commit auto fix changes them back. I don't know why.

@brynpickering
Copy link
Contributor

@lmezilis no worries. We'll probably squash all these commits when we merge it in, so it'll all be cleaned up.

You could install pre-commit locally so the fixes are managed locally. In your atlite working environment call pre-commit install. Then pre-commit will fix things before you try to commit.

RE allowing data downloads, I've found that the OpenDAP fails when trying to download more than a few MB of data at once (DAP failure or Authorization failure). Not sure if you get this issue @lmezilis but it seems to me that it's too volatile to rely upon as a way to access the data.

@lmezilis
Copy link
Author

Yes I had the same problem the last few days even though last week I could complete it. I say for now lets keep it manual, and I will contact the server to see what we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants