-
Notifications
You must be signed in to change notification settings - Fork 121
wave energy converters and new wind and wave data modules #475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
|
Thanks a lot for this nice feature! Can you let us know when you are ready for us to review it? @brynpickering Can I ask you to review it if you have capacity? Thanks! |
|
Thank you! I think we can review it right away, and I can prepare some files for documentation. Apologies for being very new to github procedures. I am slowly starting to get the hang of it. |
No worries, good that you mention it! We'll help you get settled in and don't hesitate to ask questions if something is unclear or to ping us (e.g. using |
brynpickering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR @lmezilis ! As @euronion has mentioned, some documentation is needed. My comments are about implementation. I have no comments on the method, I'm sure you've got that right! There's lots of scope to clean things up, though, which will help us maintain the feature in future.
If you want any clarification on comments/suggestions I've made, just reply directly on the comment. If you are happy with a suggestion, you can just accept it and it will automatically update the code for you. You can always accept a suggestion but then make edits to it later.
One thing missing from the new files is a REUSE header. You can find equivalents in other files. The easiest is to just copy the one across from era5.py. Then, make sure your name is in AUTHORS.rst so you count as one of the "Contributors to atlite".
atlite/datasets/era5.py
Outdated
| features = { | ||
| "height": ["height"], | ||
| "wind": ["wnd100m", "wnd_shear_exp", "wnd_azimuth", "roughness"], | ||
| "wind": ["wnd100m", "wnd_azimuth", "roughness"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably didn't mean to delete "wnd_shear_exp". I assume this is needed by other atlite methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was working with an older version of atlite, this was accidental.
atlite/datasets/era5.py
Outdated
| "wave_height": ["wave_height"], | ||
| "wave_period": ["wave_period"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collapse this into a wave: ["wave_height", "wave_period"] option. Then users can request the wave feature and get both of these variables. It's unlikely they'd ever want one but not the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, thank you.
atlite/datasets/mrel_wave.py
Outdated
| Optionally (add_lon_lat, default:True) preserves latitude and | ||
| longitude columns as 'lat' and 'lon'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mention this option in the docstring but it isn't in the method signature. Do you want to include this option or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this option was not necessary for the mrel wave files. I eventually renamed the dimentions in the last part of the script. I would say that the function "_rename_and_clean_coords" can be deleted here. do you think we should include this option?
EDIT: I saw the rest of the comments now, I will include the function and try to make it as similar to era5 as possible.
atlite/datasets/mrel_wave.py
Outdated
| def get_data_wave_height(ds): | ||
| ds = ds.rename({"hs": "wave_height"}) | ||
| ds["wave_height"] = ds["wave_height"].clip(min=0.0) | ||
|
|
||
| return ds | ||
|
|
||
|
|
||
| def get_data_wave_period(ds): | ||
| ds = ds.rename({"tp": "wave_period"}) | ||
| # ds["wave_period"] = (1 / ds["wave_period"]) | ||
| ds["wave_period"] = ds["wave_period"].clip(min=0.0) | ||
|
|
||
| return ds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the process the same as in era5.py - split this into data retrieval and then sanitisation and call the sanitisation function only if requested in get_data (you can copy most of the functionality directly over from era5.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: you don't actually need these get_data_... methods in this module, but having sanitize_... methods would be good. They can then be called iteratively in get_data as is done in era5.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct, I was working with these files a long time ago, figuring out how which functions I need, will correct this!
atlite/datasets/mrel_wave.py
Outdated
|
|
||
| def get_data_wave_period(ds): | ||
| ds = ds.rename({"tp": "wave_period"}) | ||
| # ds["wave_period"] = (1 / ds["wave_period"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always better to rely on version history to recover lines of code you no longer need, rather than commenting them out. So, feel free to delete all your commented out lines!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry for this, I thought I did that for every file but forgot this one.
atlite/convert.py
Outdated
| def convert_wave(ds, wec_type): | ||
| power_matrix = pd.DataFrame.from_dict(wec_type["Power_Matrix"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have a docstring here
atlite/convert.py
Outdated
| """ | ||
| Generate wave generation time series | ||
| evaluates the significant wave height (Hs) and wave peak period (Tp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Hs and Tp are MREL-specific and wouldn't make sense if using ERA5 data, it might make more sense to not reference them by these acronyms in this file.
| # Ignore IDE project files | ||
| .idea/ | ||
| .vscode | ||
| .vs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's generally best to add these pointers to your own "global" gitignore, rather than to every project you work on. That way, it never accidentally slips in without you realising it!
atlite/datasets/cerra.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CERRA is available via the Climate Data Store, we should follow the same approach to data retrieval as with ERA5. You could probably just copy era5.py directly and delete all but the wind getter method, then adapt the wind getter method to match the data available from CERRA.
It might be best to drop this from this PR, though, and bring it in separately later. I can see a benefit to changing the features we bring in for wind since we can retrieve wind speed at various height/pressure levels from CDS, which would allow us to create a wind vertical profile (as we do with ERA5 data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with CERRA here is that there is a lot of preprocessing of data in order for atlite to be able to read it. I agree that it is best to review this later. Should I take an action on this or can you simply reject this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest is for you to simply delete this file (and reference to cerra elsewhere). If we want to revisit it, this file will still be in the commit history so we can bring it back if we want anything from it!
|
Hello @brynpickering, I have made all of the changes locally, should I commit changes in the forked branch or is there another way to continue? |
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
Yes, in the forked branch. You should be able to just always work in the forked branch and push to your own repository ("origin") whenever you make changes. Those changes will then be made visible in this PR |
brynpickering
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @lmezilis. Just a couple of extra comments.
I still need to check the processing method, which I'll do now. In the meantime, it would be great to add some documentation. The easiest would be to add a jupyter notebook to examples and then link that into the docs by adding an entry into doc/examples (following the example of the other entries). The example notebook could load from era5 and mrel separately and maybe compare the results for a specific gridcell?
atlite/datasets/cerra.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest is for you to simply delete this file (and reference to cerra elsewhere). If we want to revisit it, this file will still be in the commit history so we can bring it back if we want anything from it!
atlite/datasets/mrel_wave.py
Outdated
| """ | ||
| Rename and sanitize retrieved wave height data. | ||
| """ | ||
| ds = ds.rename({"hs": "wave_height"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, this renaming is happening in the main function now (rename(features)).
atlite/datasets/mrel_wave.py
Outdated
| """ | ||
| Rename and sanitize retrieved wave height data. | ||
| """ | ||
| ds = ds.rename({"tp": "wave_period"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, this renaming is happening in the main function now (rename(features)).
@brynpickering I must have changed the variables to be more convenient to work back in the day and completely forgot their original names.
of course, sorry for the delays, I am working on this, I have my documentation ready but trying to make it apealing and clear. probably be ready by tomorrow
I assume between ERA5 and ECHOWAVE, correct? |
@euronion you are right, a simple |
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
Yes, |
atlite/convert.py
Outdated
| def convert_wave(ds, wec): | ||
| r""" | ||
| Convert wave height (Hs) and wave peak period (Tp) data into normalized power output | ||
| using the device-specific Wave Energy Converter (WEC) power matrix. | ||
| This function matches each combination of significant wave height and peak period | ||
| in the dataset to a corresponding power output from the WEC power matrix. | ||
| The resulting power output is normalized by the maximum possible output (capacity) | ||
| to obtain the specific generation profile. | ||
| Parameters | ||
| ---------- | ||
| ds : xarray.Dataset | ||
| Input dataset (cutout) containing two variables: | ||
| wave_height: significant wave height (m) | ||
| wave_period: peak wave period (s) | ||
| wec_type : dict | ||
| Dictionary defining the WEC characteristics, including: | ||
| Power_Matrix: a power matrix dictionary stored in "resources\wecgenerator" | ||
| Returns | ||
| ------- | ||
| xarray.DataArray | ||
| DataArray of specific power generation values (normalized power output). | ||
| Notes | ||
| ----- | ||
| A progress message is printed every one million cases to track computation. | ||
| """ | ||
|
|
||
| power_matrix = pd.DataFrame.from_dict(wec["Power_Matrix"]) | ||
| max_pow = power_matrix.to_numpy().max() | ||
|
|
||
| Hs = np.ceil(ds["wave_height"] * 2) / 2 | ||
| Tp = np.ceil(ds["wave_period"] * 2) / 2 | ||
|
|
||
| Hs_list = Hs.to_numpy().flatten().tolist() | ||
| Tp_list = Tp.to_numpy().flatten().tolist() | ||
|
|
||
| # empty list for result | ||
| power_list = [] | ||
| cases = len(Hs_list) | ||
| count = 0 | ||
|
|
||
| # for loop to loop through Hs and Tp pairs and get the power output and capacity factor | ||
| for Hs_ind, Tp_ind in zip(Hs_list, Tp_list): | ||
| if count % 1000000 == 0: | ||
| print(f"Case {count} of {cases}: %") | ||
| if np.isnan(Hs_ind) or np.isnan(Tp_ind): | ||
| power_list.append(0) | ||
| elif Hs_ind > 10 or Tp_ind > 18: | ||
| power_list.append(0) | ||
| else: | ||
| generated_power = power_matrix.loc[Hs_ind, Tp_ind] | ||
| power_list.append(generated_power / max_pow) | ||
| count += 1 | ||
|
|
||
| # results list to numpy array | ||
| power_list_np = np.array(power_list) | ||
|
|
||
| power_list_np = power_list_np.reshape(Hs.shape) | ||
|
|
||
| da = xr.DataArray( | ||
| power_list_np, coords=Hs.coords, dims=Hs.dims, name="Power generated" | ||
| ) | ||
| da.attrs["units"] = "kWh/kWp" | ||
| da = da.rename("specific generation") | ||
| da = da.fillna(0) | ||
|
|
||
| return da |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I had the chance to test this.
This suggestion vectorises the approach, chunking on the time dimension to reduce the amount of data that is processed in memory at any point. This should be much quicker than your current approach and the time chunk size can be relaxed when running on remote machines with large memory availability. You can run even on a local machine by using small time chunks.
Give it a go with data you have and let me know how it goes!
| def convert_wave(ds, wec): | |
| r""" | |
| Convert wave height (Hs) and wave peak period (Tp) data into normalized power output | |
| using the device-specific Wave Energy Converter (WEC) power matrix. | |
| This function matches each combination of significant wave height and peak period | |
| in the dataset to a corresponding power output from the WEC power matrix. | |
| The resulting power output is normalized by the maximum possible output (capacity) | |
| to obtain the specific generation profile. | |
| Parameters | |
| ---------- | |
| ds : xarray.Dataset | |
| Input dataset (cutout) containing two variables: | |
| wave_height: significant wave height (m) | |
| wave_period: peak wave period (s) | |
| wec_type : dict | |
| Dictionary defining the WEC characteristics, including: | |
| Power_Matrix: a power matrix dictionary stored in "resources\wecgenerator" | |
| Returns | |
| ------- | |
| xarray.DataArray | |
| DataArray of specific power generation values (normalized power output). | |
| Notes | |
| ----- | |
| A progress message is printed every one million cases to track computation. | |
| """ | |
| power_matrix = pd.DataFrame.from_dict(wec["Power_Matrix"]) | |
| max_pow = power_matrix.to_numpy().max() | |
| Hs = np.ceil(ds["wave_height"] * 2) / 2 | |
| Tp = np.ceil(ds["wave_period"] * 2) / 2 | |
| Hs_list = Hs.to_numpy().flatten().tolist() | |
| Tp_list = Tp.to_numpy().flatten().tolist() | |
| # empty list for result | |
| power_list = [] | |
| cases = len(Hs_list) | |
| count = 0 | |
| # for loop to loop through Hs and Tp pairs and get the power output and capacity factor | |
| for Hs_ind, Tp_ind in zip(Hs_list, Tp_list): | |
| if count % 1000000 == 0: | |
| print(f"Case {count} of {cases}: %") | |
| if np.isnan(Hs_ind) or np.isnan(Tp_ind): | |
| power_list.append(0) | |
| elif Hs_ind > 10 or Tp_ind > 18: | |
| power_list.append(0) | |
| else: | |
| generated_power = power_matrix.loc[Hs_ind, Tp_ind] | |
| power_list.append(generated_power / max_pow) | |
| count += 1 | |
| # results list to numpy array | |
| power_list_np = np.array(power_list) | |
| power_list_np = power_list_np.reshape(Hs.shape) | |
| da = xr.DataArray( | |
| power_list_np, coords=Hs.coords, dims=Hs.dims, name="Power generated" | |
| ) | |
| da.attrs["units"] = "kWh/kWp" | |
| da = da.rename("specific generation") | |
| da = da.fillna(0) | |
| return da | |
| # wave | |
| def convert_wave(ds, wec_type, time_chunk_size: int = 100) -> xr.DataArray: | |
| r""" | |
| Convert wave height (Hs) and wave peak period (Tp) data into normalized power output | |
| using the device-specific Wave Energy Converter (WEC) power matrix. | |
| This function matches each combination of significant wave height and peak period | |
| in the dataset to a corresponding power output from the WEC power matrix. | |
| The resulting power output is normalized by the maximum possible output (capacity) | |
| to obtain the specific generation profile. | |
| Parameters | |
| ---------- | |
| ds : xarray.Dataset | |
| Input dataset (cutout) containing two variables: | |
| wave_height: significant wave height (m) | |
| wave_period: peak wave period (s) | |
| wec_type : dict | |
| Dictionary defining the WEC characteristics, including: | |
| Power_Matrix: a power matrix dictionary stored in "resources\wecgenerator" | |
| time_chunk_size : int | |
| Size of time chunks for processing large datasets, to limit memory spikes. Default is 100. | |
| Returns | |
| ------- | |
| xarray.DataArray | |
| DataArray of specific power generation values (normalized power output). | |
| Notes | |
| ----- | |
| A progress message is printed every one million cases to track computation. | |
| """ | |
| power_matrix = ( | |
| pd.DataFrame.from_dict(wec_type["Power_Matrix"]) | |
| .stack() | |
| .rename_axis(index=["wave_height", "wave_period"]) | |
| .where(lambda x: x > 0) | |
| .dropna() | |
| .to_xarray() | |
| ) | |
| results = [] | |
| steps = np.arange(0, len(ds.time), step=100) | |
| for step in tqdm(steps, desc="Processing wave data chunks", total=len(steps), unit="time chunk"): | |
| ds_ = ds.isel(time=slice(step, step + time_chunk_size)) | |
| cf = power_matrix.interp( | |
| {"wave_height": ds_.wave_height, "wave_period": ds_.wave_period}, | |
| method="nearest", | |
| ) | |
| results.append(cf) | |
| da = xr.concat(results, dim="time") | |
| da.attrs["units"] = "kWh/kWp" | |
| da = da.rename("specific generation") | |
| da = da.fillna(0) | |
| return da |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok perfect, thank you very much I will test it.
I need to make a correction on the variables used. I am going through my processes and I see that I used an extrapolated tp which is the 1/fp where fp: wave peak frequency. I did this a long time ago and kind of forgot it. I used to have this conversion in the mrel_wave.py module, but it was more convenient for me back then to replace fp with tp in the dataset and test cutout creation like that. Should I include this conversion in the module?
t01 and t02 can be used aswel, and the results will be more or less the same (approx 2% off).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use either, depending on which you think is most appropriate. For comparability with the output of era5.py, if you use fp then you should convert it to wave_period (1/fp) directly in mrel.py
|
I have tried to implement the new Even with this error the cutout is seemingly finnished, but something is not passed correctly, as when I slice the cutout like so When I set it until February ( Apart from that the new code looks to be working. |
|
I have also tried to automate the cutout process in case the data are not already downloaded. It seems that there are permision issues there, as I can remotely load the dataset but i cannot load any of the variables, it has to be downloaded. So what I did is a function to create the urls and a another one to download and merge them. I have to be honest it doesn't look ideal, and also I am not sure how to use the temp directories to save the downloaded files and load them from there. I will upload an example without this feature. |
|
@lmezilis could you point me to the source of the datasets you're using? I used ones directly from the TUDelft OpenDAP but they may be slightly different to the one you have already downloaded. |
|
I am working with the source that you mentioned above: OPeNDAP. The same dataset was used for our calculations. You can see the code below for how I obtained the urls after the cutout parameters were set: |
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
…/atlite into wecmatrices-datamodules
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
I made all of these similar commits because there are some things that I need to change in the syntax, but the pre-commit auto fix changes them back. I don't know why. |
|
@lmezilis no worries. We'll probably squash all these commits when we merge it in, so it'll all be cleaned up. You could install pre-commit locally so the fixes are managed locally. In your RE allowing data downloads, I've found that the OpenDAP fails when trying to download more than a few MB of data at once ( |
|
Yes I had the same problem the last few days even though last week I could complete it. I say for now lets keep it manual, and I will contact the server to see what we can do. |
Closes # (if applicable).
Changes proposed in this Pull Request
Checklist
doc.environment.yaml,environment_docs.yamlandsetup.py(if applicable).doc/release_notes.rstof the upcoming release is included.