Skip to content

Add ability to configure how Spark handles dates in parquet files. #2175

@benedeki

Description

@benedeki

Background

With Spark 3 new option were added how to work with dates pre 1900 in parquet files
The settings are:
spark.sql.parquet.datetimeRebaseModeInRead
spark.sql.parquet.datetimeRebaseModeInWrite
spark.sql.parquet.int96RebaseModeInRead
spark.sql.parquet.int96RebaseModeInWrite

Details here.

Feature

Allow setting of the options for Enceladus jobs

### Tasks
- [ ] ~Add command line options to be able to set the **read** options. Set a default behavior either to `EXCEPTION` or `LEGACY`.~
- [ ] ~Modify the helper scripts to recognize these settings~
- [ ] ~Add an `reference.conf`/`application.conf` setting to be applied to write options. The default should be `LEGACY`~
- [ ] Modify the helper scripts to be able to easily send the Spark settings into the `spark submit` - the defaults remain the same as described above

To discuss

  • The command line option names
  • The command line defaults
  • The write configuration names

Metadata

Metadata

Assignees

Labels

ConformanceConformance Job affectedStandardizationStandardization Job affectedfeatureNew featurepriority: mediumImportant but not urgentrun scriptsHelper run scripts are affectedunder discussionRequires consideration before a decision is made whether/how to implement

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions