Refactor/test runtime#1977
Conversation
Replace the pip-tools based dependency management workflow with uv, consolidating all dependencies into pyproject.toml and a single uv.lock file. This simplifies and speeds up the development setup greatly. Changes: - Switch build backend from setuptools to hatchling - Move all dependencies from requirements/*.in to pyproject.toml - Remove setup.cfg in favour of .flake8 and pyproject.toml - Remove Makefile in favour of poethepoet tasks - Upgrade main python version (CI/CD, .python-version, etc.) to 1.12 - Use Ubuntu LTS latest for readthedocs.yaml - Refactor and update Dockerfile to use uv for installation - Upgrade the Debian version in the Dockerfile from bookworm to trixie - Add .python-version for consistent Python version management - Replace pip-tools with uv in all CI/CD workflows - Remove legacy build and update scripts (to_pypi.sh, ci/update-packages.sh, ci/run_mypy.sh) - Update documentation to reflect changes Signed-off-by: Stijn van Houwelingen <teadrinkingprogrammer@github.io>
Signed-off-by: Stijn van Houwelingen <teadrinkingprogrammer@github.io>
Signed-off-by: Stijn van Houwelingen <teadrinkingprogrammer@github.io>
Signed-off-by: Stijn van Houwelingen <teadrinkingprogrammer@github.io>
- Pre-load crypto randomizer - Use truncation instead of drop and create to clear database between tests - Optimize collection (only collect flexmeasures folder and use --import-mode=importlib) - Move pyomo imports from module level to function level - Minor optimizations for specific tests - Add Poethepoet tasks - Fix test-db service in docker compose file Signed-off-by: Stijn van Houwelingen <teadrinkingprogrammer@github.io>
Signed-off-by: Stijn van Houwelingen <teadrinkingprogrammer@github.io>
|
Thanks for giving this a go! Quick question: What does adding |
It marks a test as a unit test, which are very fast due to not depending on external functions. This allows you to very quickly run unit tests only ( |
|
Another way to see what functions are a good candidate for further optimization, is the new |
|
Update: I think I found the problem with the flaky tests. Part of my experimentation were more lax settings for PostgresQL. I updated the docker compose file but forgot to restart the service. I tested 3 times now, and so far it succeeds every time. Never mind, it turns up again. I updated the error report though |
|
I hope @Flix6x has some experience with these exact tests (are they the same everytime)? He is off today, though. |
|
If you define the uv branch as base for this one at the top of the PR, does the diff beecome more defined? |
Hopefully he can find something yes, luckily we're not in a rush. The errors are the same each time, though this time I only got an error for |
Yes, it becomes a lot more readable but I wasn't able to do that since the branch is part of my fork and it only allows be to select branches from the FlexMeasures/flexmeasures repo |
|
For my info: did you close the branch to stop the CI tests for now or because you're not planning to merge it? |
Oh sorry that seems to have been an accident! |
No problem |
Description
Completes #1976
This PR adds some optimizations to improve the runtime speed of the test suite.
Optimizations:
Misc: fix test-db docker-compose service
I also tested parallelising only unit tests, but that is not faster than just running them normally (the setup of the workers takes longer than the tests).
documentation/changelog.rstResult
Local informal test results:
Look & Feel
N.A.
How to test
Run the test suite.
Some tests fail only sometimes. I tried my best to find why, but I don't have the knowledge of the codebase to get to the root of it. It seems to have to do with functions influencing eachother by database writes where this did not happen before. I'd like to think that the current truncation is functionally the same as the drop/create flow, but that might not be the case. Another possibility could be the tests depending on some other implicit property of the previous set-up.
It seems to have to do with the functions/fixtures
db,fresh_db,fresh_queues (previously clean_redis)andkeep_scheduling_queue_emptyThis is the result in question:
945 passed 2 failed - flexmeasures/data/models/planning/tests/test_solver.py:2137 test_soc_maxima_minima_targets - flexmeasures/data/tests/test_scheduling_jobs.py:28 test_scheduling_a_battery 3 xfailedThis is the AI-aided summary of the problem:
test_scheduling_a_battery:
The job fails with
Unit conversion from MW to EUR/kWh doesn't seem possible, preceded byMissing 'power-capacity' on asset 3. Using site-power-capacity instead.When a previous test leaves some state that causes the battery asset's power-capacity to be absent, the fallback site-power-capacity (in MW) gets used in a context expecting EUR/kWh.test_soc_maxima_minima_targets:
The infeasibility error (factor_w_wh(t) [0.25] visible in the trace) suggests it's running with 15-min resolution but SOC targets/maxima from a previous test's fixture state don't match.
The error:
... more lines of code ... if sensor_d is not None and sensor_d.get_attribute( "is_strictly_non_positive" ): device_constraints[d]["derivative min"] = 0 else: production_capacity_d = get_continuous_series_sensor_or_quantity( variable_quantity=production_capacity[d], unit="MW", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, max_value=power_capacity_in_mw[d], min_value=0, # capacities are positive by definition resolve_overlaps="min", ) if ( self.flex_context.get("production_breach_price") is not None and production_capacity[d] is not None ): # consumption-capacity will become a soft constraint production_breach_price = self.flex_context[ "production_breach_price" ] any_production_breach_price = ( get_continuous_series_sensor_or_quantity( variable_quantity=production_breach_price, unit=self.flex_context["shared_currency_unit"] + "/MW", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, fill_sides=True, ) ) all_production_breach_price = ( get_continuous_series_sensor_or_quantity( variable_quantity=production_breach_price, unit=self.flex_context["shared_currency_unit"] + "/MW*h", # from EUR/MWh to EUR/MW/resolution query_window=(start, end), resolution=resolution, beliefs_before=belief_time, fill_sides=True, ) ) # Set up commitments DataFrame commitment = FlowCommitment( name=f"any production breach device {d}", quantity=-production_capacity_d, # negative price because breaching in the downwards (production) direction is penalized downwards_deviation_price=-any_production_breach_price, index=index, _type="any", device=d, ) commitments.append(commitment) commitment = FlowCommitment( name=f"all production breaches device {d}", quantity=-production_capacity_d, # negative price because breaching in the downwards (production) direction is penalized downwards_deviation_price=-all_production_breach_price, index=index, device=d, ) commitments.append(commitment) else: # consumption-capacity will become a hard constraint device_constraints[d]["derivative min"] = -production_capacity_d if sensor_d is not None and sensor_d.get_attribute( "is_strictly_non_negative" ): device_constraints[d]["derivative max"] = 0 else: consumption_capacity_d = get_continuous_series_sensor_or_quantity( variable_quantity=consumption_capacity[d], unit="MW", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, min_value=0, # capacities are positive by definition max_value=power_capacity_in_mw[d], resolve_overlaps="min", ) if ( self.flex_context.get("consumption_breach_price") is not None and consumption_capacity[d] is not None ): # consumption-capacity will become a soft constraint consumption_breach_price = self.flex_context[ "consumption_breach_price" ] any_consumption_breach_price = ( get_continuous_series_sensor_or_quantity( variable_quantity=consumption_breach_price, unit=self.flex_context["shared_currency_unit"] + "/MW", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, fill_sides=True, ) ) all_consumption_breach_price = ( get_continuous_series_sensor_or_quantity( variable_quantity=consumption_breach_price, unit=self.flex_context["shared_currency_unit"] + "/MW*h", # from EUR/MWh to EUR/MW/resolution query_window=(start, end), resolution=resolution, beliefs_before=belief_time, fill_sides=True, ) ) # Set up commitments DataFrame commitment = FlowCommitment( name=f"any consumption breach device {d}", quantity=consumption_capacity_d, upwards_deviation_price=any_consumption_breach_price, index=index, _type="any", device=d, ) commitments.append(commitment) commitment = FlowCommitment( name=f"all consumption breaches device {d}", quantity=consumption_capacity_d, upwards_deviation_price=all_consumption_breach_price, index=index, device=d, ) commitments.append(commitment) else: # consumption-capacity will become a hard constraint device_constraints[d]["derivative max"] = consumption_capacity_d all_stock_delta = [] for is_usage, soc_delta in zip([False, True], [soc_gain[d], soc_usage[d]]): if soc_delta is None: # Try to get fallback soc_delta = [None] for component in soc_delta: stock_delta_series = get_continuous_series_sensor_or_quantity( variable_quantity=component, unit="MW", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, ) # example: 4 MW sustained over 15 minutes gives 1 MWh stock_delta_series *= resolution / timedelta( hours=1 ) # MW -> MWh / resolution if is_usage: stock_delta_series *= -1 all_stock_delta.append(stock_delta_series) if len(all_stock_delta) > 0: all_stock_delta = pd.concat(all_stock_delta, axis=1) device_constraints[d]["stock delta"] = all_stock_delta.sum(1) device_constraints[d]["stock delta"] *= timedelta(hours=1) / resolution # Apply round-trip efficiency evenly to charging and discharging charging_efficiency[d] = ( get_continuous_series_sensor_or_quantity( variable_quantity=charging_efficiency[d], unit="dimensionless", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, ) .astype(float) .fillna(1) ) discharging_efficiency[d] = ( get_continuous_series_sensor_or_quantity( variable_quantity=discharging_efficiency[d], unit="dimensionless", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, ) .astype(float) .fillna(1) ) roundtrip_efficiency = flex_model[d].get( "roundtrip_efficiency", asset_d.flex_model.get("roundtrip-efficiency", 1), ) # if roundtrip efficiency is provided in the flex-model or defined as an asset attribute if ( "roundtrip_efficiency" in flex_model[d] or asset_d.flex_model.get("roundtrip-efficiency") is not None ): charging_efficiency[d] = roundtrip_efficiency**0.5 discharging_efficiency[d] = roundtrip_efficiency**0.5 device_constraints[d]["derivative down efficiency"] = ( discharging_efficiency[d] ) device_constraints[d]["derivative up efficiency"] = charging_efficiency[d] # Apply storage efficiency (accounts for losses over time) if isinstance(storage_efficiency[d], ur.Quantity) or isinstance( storage_efficiency[d], Sensor ): device_constraints[d]["efficiency"] = ( get_continuous_series_sensor_or_quantity( variable_quantity=storage_efficiency[d], unit="dimensionless", query_window=(start, end), resolution=resolution, beliefs_before=belief_time, max_value=1, ) .astype(float) .fillna(1.0) .clip(lower=0.0, upper=1.0) ) elif storage_efficiency[d] is not None: device_constraints[d]["efficiency"] = storage_efficiency[d] # Convert efficiency from sensor resolution to scheduling resolution if sensor_d.event_resolution != timedelta(0): device_constraints[d]["efficiency"] **= ( resolution / sensor_d.event_resolution ) # check that storage constraints are fulfilled if not skip_validation: constraint_violations = validate_storage_constraints( constraints=device_constraints[d], soc_at_start=soc_at_start[d], soc_min=soc_min[d], soc_max=soc_max[d], resolution=resolution, ) if len(constraint_violations) > 0: # TODO: include hints from constraint_violations into the error message message = create_constraint_violations_message( constraint_violations ) > raise ValueError( "The input data yields an infeasible problem. Constraint validation has found the following issues:\n" + message ) ... more errors of the same kind ... E t=2015-01-01 18:30:00+00:00 | equals(t) [16.136363636363637] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 18:45:00+00:00 | equals(t) [16.363636363636363] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 19:00:00+00:00 | equals(t) [16.590909090909093] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 19:15:00+00:00 | equals(t) [16.81818181818182] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 19:30:00+00:00 | equals(t) [17.045454545454547] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 19:45:00+00:00 | equals(t) [17.272727272727273] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 20:00:00+00:00 | equals(t) [17.5] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 20:15:00+00:00 | equals(t) [17.727272727272727] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 20:30:00+00:00 | equals(t) [17.954545454545453] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 20:45:00+00:00 | equals(t) [18.18181818181818] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 21:00:00+00:00 | equals(t) [18.409090909090907] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 21:15:00+00:00 | equals(t) [18.636363636363637] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 21:30:00+00:00 | equals(t) [18.863636363636363] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 21:45:00+00:00 | equals(t) [19.090909090909093] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 22:00:00+00:00 | equals(t) [19.31818181818182] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 22:15:00+00:00 | equals(t) [19.545454545454547] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 22:30:00+00:00 | equals(t) [19.772727272727273] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] E t=2015-01-01 22:45:00+00:00 | equals(t) [20.0] -max(t-1) [0.04] <=derivative_max(t) [2.0] *factor_w_wh(t) [0.25] flexmeasures/data/models/planning/storage.py:924: ValueErrorSome warnings:
---------------------------------------------------------- Captured stdout call ----------------------------------------------------------
[FLEXMEASURES][2026-02-18 16:05:23,378] WARNING: The
sensorkeyword argument is deprecated. Please, consider using the argumentasset_or_sensor.----------------------------------------------------------- Captured log call ------------------------------------------------------------
WARNING flexmeasures:init.py:91 The
sensorkeyword argument is deprecated. Please, consider using the argumentasset_or_sensor.Further Improvements
I didn't touch the tests themselves. The most relevant area is related to the part that still breaks sometimes: the more tests that use
dbinstead offresh_dbthe better. I don't know enough about the tests or the code to really look into that, so it would be nice if someone could see if there are improvements to be made there.Related Items
Based on #1643, will rebase on main once that is merged.
#114 had a similar goal.
Sign-off