Skip to content

Conversation

@suvayu
Copy link
Contributor

@suvayu suvayu commented May 26, 2025

Migrate old JSON parameter_values into the new schema that is more like a flat table (for time_series, array, and map) and singular pyarrow compatible values for date_time, duration, and time_pattern.

No related issue

Checklist before merging

  • Documentation (also in Toolbox repo) is up-to-date
  • Release notes have been updated
  • Unit tests have been added/updated accordingly
  • Code has been formatted by black & isort
  • Unit tests pass

Authors

Since GH doesn't support setting multiple people as author in a PR, documenting it here

@OleMussmann, @suvayu

Copy link
Contributor Author

@suvayu suvayu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some notes/questions as review comments.

suvayu added 5 commits May 29, 2025 15:00
TimePattern was implemented as annotated type for schema generation,
however this is not distinguishable at runtime, so add an alternate
dataclass implementation.
make columns instead of records from old format parameter_value
Ole Mussmann and others added 12 commits June 2, 2025 21:05
- update only rows that need changes
- batch row updates
- convert types to `table` where necessary
- add `transition_data` function override for debugging
Remove unnecessary nullable types, and unused union type (ValueTypes)
Do not use pandas as intermediate step, instead transform ourselves
from record based to column based - easier for type inspection.

TODO: factor out into specific for data transition and generally
useful to inserting into spinedb from outside sources when using
spinedb_api as a library.
@soininen
Copy link
Contributor

soininen commented Sep 1, 2025

I think we need to allow null values in ArrayIndex and the like to support uneven maps, i.e. this should be a valid table:

index 1 index 2 value
A null 1.1
B null 1.2
C a 2.1
C b 2.2

This raises the question whether we need the index arrays as separate types at all.

I also reintroduced custom conversion for some types into models.py to get the unit tests to pass. I am currently working on Toolbox to make it compatible with this branch and for that I need to_database() and from_database() to work.

@soininen soininen mentioned this pull request Sep 2, 2025
5 tasks
@suvayu
Copy link
Contributor Author

suvayu commented Sep 2, 2025

Hi,

Strictly speaking, the index column and value column distinction is not needed. But I would like to have them because then downstream code can make useful assumptions. But before I get into that, I think there's a misunderstanding here, mostly because I think I didn't document this anywhere. There is no requirement for all but the last column to be an index column. So this is acceptable:

col1 col2 value
A null 1.1
B null 1.2
C a 2.1
C b 2.2

It would mean col1 is index type, but col2 and value are just nullable arrays.

The reason this is useful, say, when converting to a dataframe (or something else in a user script), we can treat col1 as index, while excluding col2. Making that assumption won't require inspecting the contents of col2, the type can indicate this is safe.

Of course this is not feasible when working only with the parameter_value, but it is useful when you combine the other columns from the table.

Anyway, I think we should discuss this a bit further. I'll email you.

Cheers,

@soininen
Copy link
Contributor

soininen commented Sep 3, 2025

Strictly speaking, the index column and value column distinction is not needed. But I would like to have them because then downstream code can make useful assumptions. But before I get into that, I think there's a misunderstanding here, mostly because I think I didn't document this anywhere. There is no requirement for all but the last column to be an index column. So this is acceptable:

col1 col2 value
A null 1.1
B null 1.2
C a 2.1
C b 2.2

It would mean col1 is index type, but col2 and value are just nullable arrays.

These are very good points. I agree that the first index column (col1 in the example) should not be nullable. We should indeed keep the index column types.

- Moved dump_db_value(), from_database_to_dimension_count(),
  join_value_and_type() and split_value_and_type() to a new module
  incomplete_values.
- Added JSONConverter to importer's convert functions. This replaces
  the functionality where we tried to convert every parameter value
  string to parameter value.
@soininen soininen changed the title Transition JSON parameter_value from old to new flat format Transition value JSON from old to new flat format Sep 11, 2025
The value JSON is not compatible with previous versions.
GAMS version check runs GAMS executable to resolve its version.
The executable fails if executed in read-only directory,
so we create a temporary directory to avoid that.
@soininen

This comment was marked as resolved.



# types
class TimePeriod(str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this class be replaced by typing.NewType?

TimePeriod = NewType("TimePeriod", str)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants