Skip to content

Feature Request: Add method(s) to FailureInfo to get a dataframe, which show more explicitly what rule(s) each row broke #190

@DeflateAwning

Description

@DeflateAwning

The goal is to be able to write out a dataframe to a file with more details of what's broken.

I'll propose a few options. Most of them make sense to implement simultaneously imo, though feel free to pick your favourite.

Option 1: failure_info.invalid(keep_rule_columns: bool = False)

Simply make the .drop(self._rule_columns) part conditional. Not too sure what the names of those columns are, but they're probably reasonable.

Option 2: failure_info.invalid(with_rule_list: Literal['json', 'csv', 'list'] | None = None)

Add a column to the dataframe called something like dataframely_failed_rules. It would then add the column as a json list, as a comma-separated list of string, or as a pl.List(pl.String), depending on the option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions