-
Couldn't load subscription status.
- Fork 13
Open
Description
In yesterday's PyData meetup in Zurich, one question prompted me to realize that we're incorrectly dealing with group rules and row-level rules: if a row-level rule removes a row which would make a group rule fail, we do not realize it. For example:
import dataframely as dy
import polars as pl
class DiagnosisSchema(dy.Schema):
invoice_id = dy.String(primary_key=True)
diagnosis = dy.String(primary_key=True, regex="^[A-Z]{3}$")
is_main = dy.Bool(nullable=False)
@dy.rule()
def exactly_one_main_diagnosis() -> pl.Expr:
return pl.col("is_main").sum() == 1
df = pl.DataFrame(
{
"invoice_id": ["A", "A", "A"],
"diagnosis": ["ABC", "ABD", "123"],
"is_main": [False, False, True],
}
)
good, _ = DiagnosisSchema.filter(df)
print(good)results in
shape: (2, 3)
┌────────────┬───────────┬─────────┐
│ invoice_id ┆ diagnosis ┆ is_main │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ bool │
╞════════════╪═══════════╪═════════╡
│ A ┆ ABC ┆ false │
│ A ┆ ABD ┆ false │
└────────────┴───────────┴─────────┘
which clearly violates the schema since we don't have a main diagnosis for the group.
Metadata
Metadata
Assignees
Labels
No labels