@@ -4,59 +4,59 @@ Feature: Pipeline tests using the books dataset
44 This tests submissions using nested, complex JSON datasets with arrays, and
55 introduces more complex transformations that require aggregation.
66
7- # Scenario: Validate complex nested XML data (spark)
8- # Given I submit the books file nested_books.xml for processing
9- # And A spark pipeline is configured with schema file 'nested_books.dischema.json'
10- # And I add initial audit entries for the submission
11- # Then the latest audit record for the submission is marked with processing status file_transformation
12- # When I run the file transformation phase
13- # Then the header entity is stored as a parquet after the file_transformation phase
14- # And the nested_books entity is stored as a parquet after the file_transformation phase
15- # And the latest audit record for the submission is marked with processing status data_contract
16- # When I run the data contract phase
17- # Then there is 1 record rejection from the data_contract phase
18- # And the header entity is stored as a parquet after the data_contract phase
19- # And the nested_books entity is stored as a parquet after the data_contract phase
20- # And the latest audit record for the submission is marked with processing status business_rules
21- # When I run the business rules phase
22- # Then The rules restrict "nested_books" to 3 qualifying records
23- # And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
24- # And the nested_books entity is stored as a parquet after the business_rules phase
25- # And the latest audit record for the submission is marked with processing status error_report
26- # When I run the error report phase
27- # Then An error report is produced
28- # And The statistics entry for the submission shows the following information
29- # | parameter | value |
30- # | record_count | 4 |
31- # | number_record_rejections | 2 |
32- # | number_warnings | 0 |
7+ Scenario : Validate complex nested XML data (spark)
8+ Given I submit the books file nested_books.xml for processing
9+ And A spark pipeline is configured with schema file 'nested_books.dischema.json'
10+ And I add initial audit entries for the submission
11+ Then the latest audit record for the submission is marked with processing status file_transformation
12+ When I run the file transformation phase
13+ Then the header entity is stored as a parquet after the file_transformation phase
14+ And the nested_books entity is stored as a parquet after the file_transformation phase
15+ And the latest audit record for the submission is marked with processing status data_contract
16+ When I run the data contract phase
17+ Then there is 1 record rejection from the data_contract phase
18+ And the header entity is stored as a parquet after the data_contract phase
19+ And the nested_books entity is stored as a parquet after the data_contract phase
20+ And the latest audit record for the submission is marked with processing status business_rules
21+ When I run the business rules phase
22+ Then The rules restrict "nested_books" to 3 qualifying records
23+ And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
24+ And the nested_books entity is stored as a parquet after the business_rules phase
25+ And the latest audit record for the submission is marked with processing status error_report
26+ When I run the error report phase
27+ Then An error report is produced
28+ And The statistics entry for the submission shows the following information
29+ | parameter | value |
30+ | record_count | 4 |
31+ | number_record_rejections | 2 |
32+ | number_warnings | 0 |
3333
34- # Scenario: Validate complex nested XML data (duckdb)
35- # Given I submit the books file nested_books.xml for processing
36- # And A duckdb pipeline is configured with schema file 'nested_books_ddb.dischema.json'
37- # And I add initial audit entries for the submission
38- # Then the latest audit record for the submission is marked with processing status file_transformation
39- # When I run the file transformation phase
40- # Then the header entity is stored as a parquet after the file_transformation phase
41- # And the nested_books entity is stored as a parquet after the file_transformation phase
42- # And the latest audit record for the submission is marked with processing status data_contract
43- # When I run the data contract phase
44- # Then there is 1 record rejection from the data_contract phase
45- # And the header entity is stored as a parquet after the data_contract phase
46- # And the nested_books entity is stored as a parquet after the data_contract phase
47- # And the latest audit record for the submission is marked with processing status business_rules
48- # When I run the business rules phase
49- # Then The rules restrict "nested_books" to 3 qualifying records
50- # And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
51- # And the nested_books entity is stored as a parquet after the business_rules phase
52- # And the latest audit record for the submission is marked with processing status error_report
53- # When I run the error report phase
54- # Then An error report is produced
55- # And The statistics entry for the submission shows the following information
56- # | parameter | value |
57- # | record_count | 4 |
58- # | number_record_rejections | 2 |
59- # | number_warnings | 0 |
34+ Scenario : Validate complex nested XML data (duckdb)
35+ Given I submit the books file nested_books.xml for processing
36+ And A duckdb pipeline is configured with schema file 'nested_books_ddb.dischema.json'
37+ And I add initial audit entries for the submission
38+ Then the latest audit record for the submission is marked with processing status file_transformation
39+ When I run the file transformation phase
40+ Then the header entity is stored as a parquet after the file_transformation phase
41+ And the nested_books entity is stored as a parquet after the file_transformation phase
42+ And the latest audit record for the submission is marked with processing status data_contract
43+ When I run the data contract phase
44+ Then there is 1 record rejection from the data_contract phase
45+ And the header entity is stored as a parquet after the data_contract phase
46+ And the nested_books entity is stored as a parquet after the data_contract phase
47+ And the latest audit record for the submission is marked with processing status business_rules
48+ When I run the business rules phase
49+ Then The rules restrict "nested_books" to 3 qualifying records
50+ And The entity "nested_books" contains an entry for "17.85" in column "total_value_of_books"
51+ And the nested_books entity is stored as a parquet after the business_rules phase
52+ And the latest audit record for the submission is marked with processing status error_report
53+ When I run the error report phase
54+ Then An error report is produced
55+ And The statistics entry for the submission shows the following information
56+ | parameter | value |
57+ | record_count | 4 |
58+ | number_record_rejections | 2 |
59+ | number_warnings | 0 |
6060
6161 Scenario : Handle a file with a malformed tag (duckdb)
6262 Given I submit the books file malformed_books.xml for processing
0 commit comments