Validation result is sometimes incorrect when using group rules

In yesterday's PyData meetup in Zurich, one question prompted me to realize that we're incorrectly dealing with group rules and row-level rules: if a row-level rule removes a row which would make a group rule fail, we do not realize it. For example:


```python
import dataframely as dy
import polars as pl

class DiagnosisSchema(dy.Schema):
    invoice_id = dy.String(primary_key=True)
    diagnosis = dy.String(primary_key=True, regex="^[A-Z]{3}$")
    is_main = dy.Bool(nullable=False)

    @dy.rule()
    def exactly_one_main_diagnosis() -> pl.Expr:
        return pl.col("is_main").sum() == 1

df = pl.DataFrame(
    {
        "invoice_id": ["A", "A", "A"],
        "diagnosis": ["ABC", "ABD", "123"],
        "is_main": [False, False, True],
    }
)
good, _ = DiagnosisSchema.filter(df)
print(good)
```

results in

```
shape: (2, 3)
┌────────────┬───────────┬─────────┐
│ invoice_id ┆ diagnosis ┆ is_main │
│ ---        ┆ ---       ┆ ---     │
│ str        ┆ str       ┆ bool    │
╞════════════╪═══════════╪═════════╡
│ A          ┆ ABC       ┆ false   │
│ A          ┆ ABD       ┆ false   │
└────────────┴───────────┴─────────┘
```

which clearly violates the schema since we don't have a main diagnosis for the group.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Validation result is sometimes incorrect when using group rules #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Validation result is sometimes incorrect when using group rules #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions