refactor: implment parse as a projection #19579

trevorwhitney · 2025-10-23T17:28:20Z

What this PR does / why we need it:
This PR refactors parse operations (logfmt and json) to be implemented as an operation expression on an expand projection, similar to unwrap, and not as a custom pipeline as was done previously. In doing so, this PR introduces a new operation type of FunctionOp that can have any number of Values/Expressions as arguments, all of which are evaluated before being passed to the registered function, which is registered just on op type (and not arg type, since args are variable).

I also introduced a NamedLiteralExpr which is just a literal with a name. I thought this made adding the requested keys optimization a bit cleaner, but as it's just a literal under the hood, I'm happy to remove it if we think it's uncessary.

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

trevorwhitney · 2025-10-23T17:31:04Z

pkg/engine/internal/executor/expressions.go

-
+	case *physical.NamedLiteralExpr:
+		return &Scalar{
+			value: expr.Literal,


the literal being used here in the case of parse is actually an array, which I'm aware is technically not a scalar. the value being passed here though is a pointer to that array, which one could argue is a scalar. that being said, let me know if you'd prefer another type. my thought was to avoid that so we don't need to type check the incoming literal.

trevorwhitney · 2025-10-23T17:31:57Z

pkg/engine/internal/executor/parse_test.go

these were moved to the executor tests above

trevorwhitney · 2025-10-23T17:33:49Z

pkg/engine/internal/planner/physical/expressions.go

+	return ExprTypeUnary
+}
+
+type NamedLiteralExpr struct {


I'm open to push back on just using a Literal instead, but I added this as I think it makes the optimize code cleaner, where we can look specifically for the requestedKeys literal when pushing down projections, rather than looking for any literal of the right type.

Named literals do feel a little weird to me, especially just for an optimize pass, and when the position of the arguments do matter (the second argument must be the requested keys, not any argument that's a NamedLiteral of requestedKeys).

I think if we want to make hte optimize pass cleaner, we could unpack argument slices into a struct instead:

type parseArguments struct { columnToParse Expression requestedKeys Expression } // Unpack unpacks the expression from src into args. Unpack returns // an error if there are not exactly 1 or 2 arguments: // // - parse(columnToParse) // - parse(columnToParse, requestedKeys) func (args *parseArguments) Unpack(src []Expression) error { ... } // Pack packs args into a dst slice. Returns a new slice if dst isn't // large enough. func (args *parseArguments) Pack(dst []Expression) []Expression { ... }

Then your optimization pass could use this:

func (r *projectionPushdown) handleParse(expr *FunctionExpr, ...) ([]ColumnExpression, bool) { var args parseArguments if err := args.Unpack(expr.Expressions); err != nil { // Panic, I guess? } if args.requestedKeys == nil { // Initialize args.requestedKeys } existingKeys, ok := args.requestedKeys.(types.StringListLiteral) ... // Copy back over into the FunctionExpr. expr.Arguments = args.Pack(expr.Arguments) }

rfratto · 2025-10-23T22:30:23Z

pkg/engine/internal/planner/logical/builder_convert.go

-	case *UnaryOp:
-		return b.processUnaryOp(value)


Did you mean to remove UnaryOp here?

no, I did not, thank you

rfratto · 2025-10-23T22:32:53Z

pkg/engine/internal/executor/functions.go

+	if u.reg == nil {
+		u.reg = make(map[types.FunctionOp]Function)
+	}
+	// TODO(twhitney): Should the function panic when duplicate keys are registered?


Yeah probably, plus since it'd panic in the init we'd catch it immediately in unit tests rather than being confused about why we're not using the implementation of a function we expected.

rfratto · 2025-10-23T22:34:10Z

pkg/engine/internal/executor/parse.go

+	}
+
+	if sourceColVec == nil {
+		return nil, nil, fmt.Errorf("parse function arguments did no include a source ColumnVector to parse")


Suggested change

return nil, nil, fmt.Errorf("parse function arguments did no include a source ColumnVector to parse")

return nil, nil, fmt.Errorf("parse function arguments did not include a source ColumnVector to parse")

rfratto · 2025-10-23T22:37:19Z

pkg/engine/internal/executor/parse.go

-	}, input)
+	var requestedKeys []string
+	if requestedKeysColVec != nil {
+		reqKeysValue := requestedKeysColVec.Value(0)


Can we assert that the requestedKeysColVec must be a scalar? Otherwise I think the behaviour will be a little confusing if you happen to pass in an actual vector but only the first row gets used.

with #19549 we no longer have scalars, so I might need to check it's a StringListLiteral instead? I'll rebase on @chaudum changes and investigate.

rfratto · 2025-10-23T22:38:44Z

pkg/engine/internal/planner/physical/expressions.go

+
+// Clone returns a copy of the [FunctionExpr].
+func (e *FunctionExpr) Clone() Expression {
+	params := make([]Expression, len(e.Expressions))


You can use cloneExpressions(e.Expressions) here

rfratto · 2025-10-23T22:48:07Z

pkg/engine/internal/planner/physical/expressions.go

+	return ExprTypeUnary
+}
+
+type NamedLiteralExpr struct {


Named literals do feel a little weird to me, especially just for an optimize pass, and when the position of the arguments do matter (the second argument must be the requested keys, not any argument that's a NamedLiteral of requestedKeys).

I think if we want to make hte optimize pass cleaner, we could unpack argument slices into a struct instead:

type parseArguments struct { columnToParse Expression requestedKeys Expression } // Unpack unpacks the expression from src into args. Unpack returns // an error if there are not exactly 1 or 2 arguments: // // - parse(columnToParse) // - parse(columnToParse, requestedKeys) func (args *parseArguments) Unpack(src []Expression) error { ... } // Pack packs args into a dst slice. Returns a new slice if dst isn't // large enough. func (args *parseArguments) Pack(dst []Expression) []Expression { ... }

Then your optimization pass could use this:

func (r *projectionPushdown) handleParse(expr *FunctionExpr, ...) ([]ColumnExpression, bool) { var args parseArguments if err := args.Unpack(expr.Expressions); err != nil { // Panic, I guess? } if args.requestedKeys == nil { // Initialize args.requestedKeys } existingKeys, ok := args.requestedKeys.(types.StringListLiteral) ... // Copy back over into the FunctionExpr. expr.Arguments = args.Pack(expr.Arguments) }

chaudum · 2025-10-24T06:23:42Z

pkg/engine/internal/executor/expressions.go

 		}, nil
-
+	case *physical.NamedLiteralExpr:
+		return &Scalar{


I merged #19549 earlier today, so Scalar won't be available any more. Use NewScalar(expr.Literal, input.NumRows()) instead

chaudum · 2025-10-24T06:58:15Z

pkg/engine/internal/executor/functions.go

+	GetForSignature(types.FunctionOp) (Function, error)
+}
+
+type Function interface {


nit: Should we call it VariadicFunction?

haha, I had that in one iteration, naming was hard, I went through a few options, but I'm happy to use Variadic.

chaudum · 2025-10-24T07:02:32Z

pkg/engine/internal/executor/expressions.go

+		args := make([]ColumnVector, len(expr.Expressions))
+		for i, arg := range expr.Expressions {
+			p, err := e.eval(arg, input)
+			if err != nil {
+				return nil, err
+			}
+			args[i] = p
+		}


I think we need to find a way to optimize this at some point.
Parsing the argument expressions every time for each batch is a lot of overhead, especially also because these are always string literals (aren't they) and therefore have a single value across all rows.

Ah wait, the function argument is the message column.

At least for now. Later, when we support | logfmt foo,bar this may become a problem.

chaudum · 2025-10-24T07:10:05Z

pkg/engine/internal/types/types.go

 	FLOAT64   = Type(arrow.FLOAT64)
 	TIMESTAMP = Type(arrow.TIMESTAMP)
 	STRUCT    = Type(arrow.STRUCT)
+	LIST      = Type(arrow.LIST)


Please also add to Type.String() function

chaudum · 2025-10-24T07:11:32Z

pkg/engine/internal/types/types.go

 	return tStruct{arrowType: arrowType}
 }

+type tList struct {


Please also add to the Loki->Arrow type mapping below

chaudum · 2025-10-24T07:14:17Z

pkg/engine/internal/types/operators.go

+	case FunctionOpParseJSON:
+		return "PARSE_JSON"
+	default:
+		panic(fmt.Sprintf("unknown unary operator %d", t))


Suggested change

panic(fmt.Sprintf("unknown unary operator %d", t))

panic(fmt.Sprintf("unknown variadic function operator %d", t))

commit 4e5f95f Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 13:47:59 2025 -0600 test: fix planner tests commit dfbdcb7 Merge: c997112 68df3ef Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 13:26:39 2025 -0600 Merge branch 'main' into twhitney/refactor-parse commit c997112 Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 13:24:03 2025 -0600 chore: fix linting errors commit 037e337 Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 12:54:27 2025 -0600 test: fix field names in expression test commit ad6b101 Merge: 79f2cea d4c53e9 Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 12:46:34 2025 -0600 Merge branch 'main' into twhitney/refactor-parse commit 79f2cea Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 12:44:12 2025 -0600 test: fix workflow planner test commit 40db5ef Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 11:38:25 2025 -0600 chore: clena up a few comments commit ad91fda Author: Trevor Whitney <trevorjwhitney@gmail.com> Date: Thu Oct 23 11:23:19 2025 -0600 refactor: implment parse as a projection

Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>

trevorwhitney · 2025-10-27T16:11:16Z

pkg/engine/internal/planner/planner_test.go

+            └── Projection all=true expand=(PARSE_JSON(builtin.message, []))
+                └── Projection all=true expand=(PARSE_LOGFMT(builtin.message, []))


do we want to merge these projections? maybe in a later PR?

trevorwhitney · 2025-10-27T16:11:45Z

pkg/engine/internal/planner/planner_test.go

+                └── Projection all=true expand=(PARSE_LOGFMT(builtin.message, [bar, request_duration]))
+                    └── Compat src=metadata dst=metadata collision=label


@chaudum does the Compat layer need to come before the Projection?

trevorwhitney requested a review from a team as a code owner October 23, 2025 17:28

pull-request-size bot added the size/XXL label Oct 23, 2025

trevorwhitney commented Oct 23, 2025

View reviewed changes

pkg/engine/internal/executor/parse_test.go Outdated

Copy link

Collaborator Author

trevorwhitney Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these were moved to the executor tests above

trevorwhitney commented Oct 23, 2025

View reviewed changes

trevorwhitney force-pushed the twhitney/refactor-parse branch from f024161 to 40db5ef Compare October 23, 2025 18:01

rfratto reviewed Oct 23, 2025

View reviewed changes

chaudum reviewed Oct 24, 2025

View reviewed changes

trevorwhitney added 2 commits October 24, 2025 13:47

chore: merge cleanup

e7ead00

trevorwhitney force-pushed the twhitney/refactor-parse branch from 4e5f95f to e7ead00 Compare October 24, 2025 20:04

trevorwhitney added 4 commits October 27, 2025 09:52

chore: PR comments and rebase/merge fixes

c62fc03

Merge branch 'main' into twhitney/refactor-parse

8b7b2eb

Signed-off-by: Trevor Whitney <trevorjwhitney@gmail.com>

chore: fix more merging mess

5b98db0

test: fix planner_test.go

645a462

trevorwhitney commented Oct 27, 2025

View reviewed changes

trevorwhitney added 5 commits October 27, 2025 11:35

chore: format

083850c

Merge branch 'main' into twhitney/refactor-parse

b4e7c96

Merge branch 'main' into twhitney/refactor-parse

7811d09

Merge branch 'main' into twhitney/refactor-parse

b169758

chore: get off my lawn revive

580567c

trevorwhitney requested review from chaudum and rfratto October 28, 2025 15:59

	return nil, nil, fmt.Errorf("parse function arguments did no include a source ColumnVector to parse")
	return nil, nil, fmt.Errorf("parse function arguments did not include a source ColumnVector to parse")

	panic(fmt.Sprintf("unknown unary operator %d", t))
	panic(fmt.Sprintf("unknown variadic function operator %d", t))

		└── Projection all=true expand=(PARSE_JSON(builtin.message, []))
		└── Projection all=true expand=(PARSE_LOGFMT(builtin.message, []))

		└── Projection all=true expand=(PARSE_LOGFMT(builtin.message, [bar, request_duration]))
		└── Compat src=metadata dst=metadata collision=label

Uh oh!

refactor: implment parse as a projection #19579

Are you sure you want to change the base?

refactor: implment parse as a projection #19579

Uh oh!

Conversation

trevorwhitney commented Oct 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevorwhitney Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

trevorwhitney Oct 24, 2025 •

edited

Loading