Adding FailedRowProcessor support in soda-spark by joaoluga · Pull Request #114 · sodadata/soda-spark

joaoluga · 2022-06-01T19:06:04Z

Resolves #113

snippet:

from sodaspark import scan
from sodasql.scan.failed_rows_processor import FailedRowsProcessor
from pyspark.sql.types import StructType, StructField, StringType, IntegerType


class InMemoryFailedRowProcessor(FailedRowsProcessor):

    def process(self, context):

        try:
            print(context)
        except Exception:
            raise Exception

        return {'message': 'All failed rows were printed in your terminal'}


data2 = [("1", 100),
         ("2", 200),
         ("3", None),
         ("4", 400),
         ]

schema = StructType([
    StructField("id", StringType(), True),
    StructField("number", IntegerType(), True)
    ])

df = spark.createDataFrame(data=data2, schema=schema)

scan_definition = """
table_name: my_table
metric_groups:
    - all
samples:
    table_limit: 5
    failed_limit: 5
tests:
    - row_count > 0
columns:
    number:
        tests:
            - duplicate_count == 0
            - missing_count == 0
"""

scan_result = scan.execute(scan_definition, df, failed_rows_processor=InMemoryFailedRowProcessor())

expected output:

{'sample_name': 'dataset', 'column_name': None, 'test_ids': None, 'sample_columns': [{'name': 'id', 'type': 'string'}, {'name': 'number', 'type': 'int'}], 'sample_rows': [['1', 100], ['2', 200], ['3', None], ['4', 400]], 'sample_description': 'my_table.sample', 'total_row_count': 4}
{'sample_name': 'missing', 'column_name': 'number', 'test_ids': ['{"column":"number","expression":"missing_count == 0"}'], 'sample_columns': [{'name': 'id', 'type': 'string'}, {'name': 'number', 'type': 'int'}], 'sample_rows': [['3', None]], 'sample_description': 'my_table.number.missing', 'total_row_count': 1}

- Allowing users to use the FailedRowsProcessor feature by passing it in the execute method

JCZuurmond · 2022-06-09T18:53:12Z

Thanks @joaoluga for the PR. Could you add a test for the failed rows processor?

joaoluga · 2022-06-22T11:52:21Z

Thanks @joaoluga for the PR. Could you add a test for the failed rows processor?

hey @JCZuurmond, sorry for taking so long. Yes and I've just included the tests for the failed row processor just now. 😁

JCZuurmond

I have a couple suggestions

JCZuurmond · 2022-06-22T18:31:20Z

tests/test_scan.py

+class InMemoryFailedRowProcessor(FailedRowsProcessor):
+    def process(self, context: dict) -> dict:
+
+        try:


This try except does not do anything, right?

Yes, you are correct. I was just following the pattern I found in this doc 😅 Just changed the except to throw the exception 🤔

tests/test_scan.py

JCZuurmond

LGTM. @vijaykiran could you give a final go?

vijaykiran · 2022-06-23T08:36:34Z

Thank you @joaoluga and @JCZuurmond - LGTM!

joaolug added 3 commits June 1, 2022 13:55

Including failed_rows_processor to sodaspark

fe4ad8b

- Allowing users to use the FailedRowsProcessor feature by passing it in the execute method

Adding docstring to failed_rows_processor

a23c01f

Adding missing docstrings

0cd3b21

including unit tests for failed_row_processor

abc73e3

JCZuurmond suggested changes Jun 22, 2022

View reviewed changes

joaolug added 3 commits June 22, 2022 17:16

Removing try/except flow since due to no use

2dcf5b1

Adding test description

9254165

Fixing test assert

b7cd8af

JCZuurmond approved these changes Jun 23, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding FailedRowProcessor support in soda-spark#114

Adding FailedRowProcessor support in soda-spark#114
joaoluga wants to merge 7 commits intosodadata:mainfrom
joaoluga:adding_failedrowsprocessor

joaoluga commented Jun 1, 2022

Uh oh!

JCZuurmond commented Jun 9, 2022

Uh oh!

joaoluga commented Jun 22, 2022 •

edited

Loading

Uh oh!

JCZuurmond left a comment

Uh oh!

JCZuurmond Jun 22, 2022

Uh oh!

joaoluga Jun 22, 2022

Uh oh!

Uh oh!

Uh oh!

JCZuurmond left a comment

Uh oh!

vijaykiran commented Jun 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joaoluga commented Jun 1, 2022

Uh oh!

JCZuurmond commented Jun 9, 2022

Uh oh!

joaoluga commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JCZuurmond left a comment

Choose a reason for hiding this comment

Uh oh!

JCZuurmond Jun 22, 2022

Choose a reason for hiding this comment

Uh oh!

joaoluga Jun 22, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JCZuurmond left a comment

Choose a reason for hiding this comment

Uh oh!

vijaykiran commented Jun 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joaoluga commented Jun 22, 2022 •

edited

Loading