Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b4cfa0e
reaggregate support for groupBy.
cpwright Dec 10, 2025
453aab1
copilot review
cpwright Dec 10, 2025
4959126
Update engine/table/src/main/java/io/deephaven/engine/table/impl/by/G…
cpwright Dec 10, 2025
fae01cb
Update engine/table/src/main/java/io/deephaven/engine/table/impl/sour…
cpwright Dec 10, 2025
2ca4ae4
formula stuff
cpwright Dec 11, 2025
52c0fb8
reagg formula examples
cpwright Dec 12, 2025
946ca37
handle things without the group operator present.
cpwright Dec 29, 2025
b1a3f4b
Reuse group by for normal aggs.
cpwright Dec 29, 2025
eb10ff5
spotless
cpwright Dec 29, 2025
9b9e4ad
test fix
cpwright Dec 29, 2025
c71c46c
more todos for reuse.
cpwright Dec 30, 2025
da011ea
Reaggregate groupby reuse.
cpwright Dec 31, 2025
801caa4
some coverage of operator reuse.
cpwright Dec 31, 2025
bbe4042
Update engine/table/src/main/java/io/deephaven/engine/table/impl/by/G…
cpwright Dec 31, 2025
41739fd
Update engine/table/src/main/java/io/deephaven/engine/table/impl/by/G…
cpwright Dec 31, 2025
be21002
demonstrate broken modifications in unit test.
cpwright Dec 31, 2025
307a803
fill in tests.
cpwright Dec 31, 2025
c42360e
Modification fix.
cpwright Jan 12, 2026
f7979b6
Merge remote-tracking branch 'upstream/main' into nightly/cpw/rollupg…
cpwright Jan 12, 2026
4b2c73c
copilot speling concerns.
cpwright Jan 12, 2026
5d8e6ef
initial rollup aggformula docs.
cpwright Jan 12, 2026
169c300
Capped sum doc, fix tests now that we hide the exposed group column.
cpwright Jan 13, 2026
f0d41b3
copilot nits
cpwright Jan 13, 2026
9240d29
another test, more rename tomfoolery.
cpwright Jan 13, 2026
b855a73
operator should always be updated for a formula, produce mangled resu…
cpwright Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/groovy/reference/table-operations/create/rollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ The following aggregations are supported:
- [`AggCountDistinct`](../group-and-aggregate/AggCountDistinct.md)
- [`AggCountWhere`](../group-and-aggregate/AggCountWhere.md)
- [`AggFirst`](../group-and-aggregate/AggFirst.md)
- [`AggFormula`](../group-and-aggregate/AggFormula.md)
- [`AggGroup`](../group-and-aggregate/AggGroup.md)
- [`AggLast`](../group-and-aggregate/AggLast.md)
- [`AggMax`](../group-and-aggregate/AggMax.md)
- [`AggMin`](../group-and-aggregate/AggMin.md)
Expand Down Expand Up @@ -124,6 +126,78 @@ result = source.rollup(aggList, false, "N", "M")

![The above `result` rollup table](../../../assets/how-to/rollup-table-realtime.gif)

## Formula Aggregations in Rollups

When a rollups includes a formula aggregation, care should be taken with the function being applied. On each tick, the formula is evaluated for each changed row in the output table. Because the aggregated rows include many source rows, the input vectors to a formula aggregation can be very large (at the root level, they are the entire source table). If the formula is not efficient with large input vectors, the performance of the rollup can be poor.

By default, the formula aggregation operates on a group of all the values as they appeared in the source table. In this example, the `Value` column contains the same vector that is used as input to the formula:

```groovy
source = newTable(
stringCol("Key", "Alpha", "Alpha", "Alpha", "Bravo", "Bravo", "Charlie", "Charlie"),
intCol("Value", 10, 10, 10, 20, 20, 30, 30))
simpleSum = source.rollup(List.of(AggGroup("Value"), AggFormula("Sum = sum(Value)")), "Key")
```

To calculate the sum for the root row, every row in the source table is read. The Deephaven engine provides no mechanism to provide detailed update information for a vector. Thus, every time the table ticks, the formula is completely re-evaluated.

### Formula Reaggregation

Formula reaggregation can be used to limit the size of input vectors while evaluating changes to a rollup. Each level of the rollup must have the same constituent types and names, which can make formulating your query more complicated.

```groovy
source = newTable(
stringCol("Key", "Alpha", "Alpha", "Alpha", "Bravo", "Bravo", "Charlie", "Charlie"),
intCol("Value", 10, 10, 10, 20, 20, 30, 30))
reaggregatedSum = source.updateView("Sum=(long)Value").rollup(List.of(AggFormula("Sum = sum(Sum)").asReaggregating()), "Key")
```

The results of `reaggregatedSum` are identical to `simpleSum`; but they are evaluated differently. In simpleSum, the source table is read twice: once to calculate the lowest-level sums, and a second time to calculate the top-level sum. In the reaggregated example, the source table is read once to calculate the lowest-level sums; and then the intermediate sums for each `Key` are read to calculate the top-level sum. If a row was added to `source` with the key `Delta`; then `simpleSum` would read that row, calculate a new sum for `Delta` and the top-level would read all eight rows of the table. The `reaggregatedSum` would similarly calculate the new sum for `Delta`, but the top-level would only read the intermediate sums for `Alpha`, `Bravo`, `Charlie`, and `Delta` instead of all eight source rows. As the number of states and the size of the input tables increase, the performance impact of evaluating a formula over all rows the table increases.

In the previous example, the `Sum` column evaluated the [`sum(IntVector)`](https://docs.deephaven.io/core/javadoc/io/deephaven/function/Numeric.html#sum(io.deephaven.vector.IntVector)) function at every level of the rollup and produced a `long`. If the original table with an `int` column was used, then the lowest-level rollup would provide an `IntVector` as input to the `sum` and the next-level would provide `LongVector`. Similarly, the source table had a column named `Value`; whereas the aggregation produces a result named `Sum`. To address both these issues, before passing `source` to rollup, we called `updateView` to cast the `Value` column to `long` as `Sum`. If we ran the same example without the cast:

```groovy syntax
source = newTable(
stringCol("Key", "Alpha", "Alpha", "Alpha", "Bravo", "Bravo", "Charlie", "Charlie"),
intCol("Value", 10, 10, 10, 20, 20, 30, 30))
reaggregatedSum = source.rollup(List.of(AggFormula("Value = sum(Value)").asReaggregating()), "Key")
```

We instead get an Exception message indicating that the formula cannot be applied properly:

```text
java.lang.ClassCastException: class io.deephaven.engine.table.vectors.LongVectorColumnWrapper cannot be cast to class io.deephaven.vector.IntVector (io.deephaven.engine.table.vectors.LongVectorColumnWrapper and io.deephaven.vector.IntVector are in unnamed module of loader 'app')
```

### Formula Depth

Formula aggregations may include the constant `__FORMULA_DEPTH__` column, which is the depth of the formula aggregation in the rollup tree. The root node of the rollup has a depth of 0, the next level is 1, and so on. This can be used to implement distinct aggregations at each level of the rollup. For example:

```groovy
source = newTable(
stringCol("Key", "Alpha", "Alpha", "Alpha", "Bravo", "Bravo", "Charlie", "Charlie"),
intCol("Value", 10, 10, 10, 20, 20, 30, 30))
firstThenSum = source.rollup(List.of(AggFormula("Value = __FORMULA_DEPTH__ == 0 ? sum(Value) : first(Value)")), "Key")
```

In this case, for each value of `Key`, the aggregation returns the first value. For the root level, the aggregation returns the sum of all values. When combined with a reaggregating formula, even more interesting semantics are possible. For example, rather than summing all of the values; we can sum the values from the prior level:

```groovy
source = newTable(
stringCol("Key", "Alpha", "Alpha", "Alpha", "Bravo", "Bravo", "Charlie", "Charlie"),
intCol("Value", 10, 10, 10, 20, 20, 30, 30))
firstThenSum = source.updateView("Value=(long)Value").rollup(List.of(AggFormula("Value = __FORMULA_DEPTH__ == 0 ? sum(Value) : first(Value)").asReaggregating()), "Key")
```

Another simple example of reaggration is a capped sum. In this example, the sums below the root level are capped at 40:

```groovy
source = newTable(
stringCol("Key", "Alpha", "Alpha", "Alpha", "Bravo", "Bravo", "Charlie", "Charlie"),
intCol("Value", 10, 20, 15, 20, 15, 25, 35))
cappedSum = source.updateView("Value=(long)Value").rollup(List.of(AggFormula("Value = __FORMULA_DEPTH__ == 0 ? sum(Value) : min(sum(Value), 40)").asReaggregating()), "Key")
```

## Related documentation

- [`AggAvg`](../group-and-aggregate/AggAvg.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ title: AggFormula
## Syntax

```
AggFormula(formula)
AggFormula(formula, paramToken, columnNames...)
```

Expand All @@ -21,19 +22,21 @@ The user-defined formula to apply to each group. This formula can contain:
- Mathematical operations such as `*`, `+`, `/`, etc.
- [User-defined closures](../../../how-to-guides/groovy-closures.md)

If `paramToken` is not `null`, the formula can only be applied to one column at a time, and it is applied to the specified `paramToken`. If `paramToken` is `null`, the formula is applied to any column or literal value present in the formula. The use of `paramToken` is deprecated.
The formula is typically specified as `OutputColumn = Expression` (when `paramToken` is `null`). The formula is applied to any column or literal value present in the formula. For example, `Out = KeyColumn * max(ValueColumn)`, produces an output column with the name `Out` and uses `KeyColumn` and `ValueColumn` as inputs.

If `paramToken` is not `null`, the formula can only be applied to one column at a time, and it is applied to the specified `paramToken`. In this case, the formula does not supply an output column, but rather it is derived from the `columnNames` parameter. The use of `paramToken` is deprecated.

Key column(s) can be used as input to the formula. When this happens, key values are treated as scalars.

</Param>
<Param name="paramToken" type="String">

The parameter name within the formula. If `paramToken` is `each`, then `formula` must contain `each`. For example, `max(each)`, `min(each)`, etc. Use of this parameter is deprecated.
The parameter name within the formula. If `paramToken` is `each`, then `formula` must contain `each`. For example, `max(each)`, `min(each)`, etc. Use of this parameter is deprecated. A non-null value is not permitted in [rollups](../create/rollup.md).

</Param>
<Param name="columnNames" type="String...">

The source column(s) for the calculations.
The source column(s) for the calculations. The source column names are only used when `paramToken` is not `null`, and are thus similarly deprecated.

- `"X"` applies the formula to each value in the `X` column for each group.
- `"Y = X"` applies the formula to each value in the `X` column for each group and renames it to `Y`.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"file":"reference/table-operations/create/rollup.md","objects":{"source":{"type":"Table","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"int"}],"rows":[[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Charlie"},{"value":"30"}],[{"value":"Charlie"},{"value":"30"}]]}},"reaggregatedSum":{"type":"HierarchicalTable","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Sum","type":"long"}],"rows":[[{"value":""},{"value":"130"}],[{"value":"Alpha"},{"value":"30"}],[{"value":"Bravo"},{"value":"40"}],[{"value":"Charlie"},{"value":"60"}]],"rowDepths":[1,2,2,2]}}}}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"file":"reference/table-operations/create/rollup.md","objects":{"source":{"type":"Table","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"int"}],"rows":[[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Charlie"},{"value":"30"}],[{"value":"Charlie"},{"value":"30"}]]}},"simpleSum":{"type":"HierarchicalTable","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"io.deephaven.vector.IntVector"},{"name":"Sum","type":"long"}],"rows":[[{"value":""},{"value":"10,10,10,20,20,30,30"},{"value":"130"}],[{"value":"Alpha"},{"value":"10,10,10"},{"value":"30"}],[{"value":"Bravo"},{"value":"20,20"},{"value":"40"}],[{"value":"Charlie"},{"value":"30,30"},{"value":"60"}]],"rowDepths":[1,2,2,2]}}}}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"file":"reference/table-operations/create/rollup.md","objects":{"source":{"type":"Table","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"int"}],"rows":[[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"20"}],[{"value":"Alpha"},{"value":"15"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Bravo"},{"value":"15"}],[{"value":"Charlie"},{"value":"25"}],[{"value":"Charlie"},{"value":"35"}]]}},"cappedSum":{"type":"HierarchicalTable","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"long"}],"rows":[[{"value":""},{"value":"115"}],[{"value":"Alpha"},{"value":"40"}],[{"value":"Bravo"},{"value":"35"}],[{"value":"Charlie"},{"value":"40"}]],"rowDepths":[1,2,2,2]}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"file":"reference/table-operations/create/rollup.md","objects":{"source":{"type":"Table","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"int"}],"rows":[[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Charlie"},{"value":"30"}],[{"value":"Charlie"},{"value":"30"}]]}},"firstThenSum":{"type":"HierarchicalTable","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"long"}],"rows":[[{"value":""},{"value":"130"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Charlie"},{"value":"30"}]],"rowDepths":[1,2,2,2]}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"file":"reference/table-operations/create/rollup.md","objects":{"source":{"type":"Table","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"int"}],"rows":[[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Charlie"},{"value":"30"}],[{"value":"Charlie"},{"value":"30"}]]}},"firstThenSum":{"type":"HierarchicalTable","data":{"columns":[{"name":"Key","type":"java.lang.String"},{"name":"Value","type":"long"}],"rows":[[{"value":""},{"value":"60"}],[{"value":"Alpha"},{"value":"10"}],[{"value":"Bravo"},{"value":"20"}],[{"value":"Charlie"},{"value":"30"}]],"rowDepths":[1,2,2,2]}}}}

This file was deleted.

This file was deleted.

5 changes: 5 additions & 0 deletions engine/api/src/main/java/io/deephaven/engine/table/Table.java
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,11 @@ <DATA_TYPE> CloseableIterator<DATA_TYPE> objectColumnIterator(@NotNull String co
* </p>
*
* <p>
* A blink table can be converted to an append-only table with
* {@link io.deephaven.engine.table.impl.BlinkTableTools#blinkToAppendOnly(io.deephaven.engine.table.Table)}.
* </p>
*
* <p>
* Some aggregations (in particular {@link #groupBy} and {@link #partitionBy} cannot provide the desired blink table
* aggregation semantics because doing so would require storing the entire stream of blink updates in memory. If
* that behavior is desired, use {@code blinkToAppendOnly}. If on the other hand, you would like to group or
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -270,10 +270,12 @@ UnaryOperator<ModifiedColumnSet>[] initializeRefreshing(@NotNull final QueryTabl
*
* @param upstream The upstream {@link TableUpdateImpl}
* @param startingDestinationsCount The number of used destinations at the beginning of this step
* @param modifiedOperators an array of booleans, parallel to operators, indicating which operators were modified
*/
void resetOperatorsForStep(@NotNull final TableUpdate upstream, final int startingDestinationsCount) {
for (final IterativeChunkedAggregationOperator operator : operators) {
operator.resetForStep(upstream, startingDestinationsCount);
void resetOperatorsForStep(@NotNull final TableUpdate upstream, final int startingDestinationsCount,
final boolean[] modifiedOperators) {
for (int ii = 0; ii < operators.length; ii++) {
modifiedOperators[ii] = operators[ii].resetForStep(upstream, startingDestinationsCount);
}
}

Expand Down
Loading
Loading