Skip to content

Replace UDFs/UDAs with Spark's Catalog #361

@okennedy

Description

@okennedy

At present, User-defined functions (UDFs) and User-defined aggregates (UDAs) can be defined either in Mimir-land or in Spark-land. Moreover,

  1. Spark's UDA/UDF catalog implementation is virtually identical to Mimir's
  2. There's a mountain of libraries that already support spark
  3. Function and aggregate management is a non-trivial 1k lines of code (or more).

I propose that we defer to Spark's catalog to cut out a ton of redundant code from Mimir. This would require the following changes:

  1. RAToSpark: Could now directly use the Spark catalog to instantiate functions (see the new MimirSQL for a few examples on how this might work)
  2. Typechecker: Would need to use Spark's catalog to check types. This could get a little awkward, since Spark's and Mimir's typesystems differ. Would probably require RAToSQL to handle some translations.
  3. Eval / EvalInline: Would now talk Spark for function execution

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions