Skip to content

Conversation

@sim1984
Copy link
Contributor

@sim1984 sim1984 commented Oct 30, 2025

GENERATE_SERIES function

The GENERATE_SERIES function creates a series of numbers within a specified interval.
The interval and the step between series values ​​are defined by the user.

Syntax

<generate_series_function> ::=
    GENERATE_SERIES(<start>, <finish> [, <step>]) [AS] <correlation name> [ ( <derived column name> ) ]

Arguments

  • start - The first value in the interval. start is specified as a variable, a literal, or a scalar expression of type
    smallint, integer, bigint, int128 or numeric/decimal.

  • finish - The last value in the interval. finish is specified as a variable, a literal, or a scalar expression of
    type smallint, integer, bigint, int128 or numeric/decimal. The series stops once the last generated step value exceeds
    the finish value.

  • step - Indicates the number of values to increment or decrement between steps in the series. step is an expression
    of type smallint, integer, bigint, int128 or numeric/decimal. step can be either negative or positive, but can't be zero (0). This argument is optional. The default value for step is 1.

Returning type

The function GENERATE_SERIES returns a set with BIGINT, INT128 or NUMERIC/DECIMAL column, where the scale is
determined by the maximum of the scales of the function arguments.

Rules

  • If start > finish and a negative step value is specified, an empty set is returned.

  • If start < finish and a positive step value is specified, an empty set is returned.

  • If the step argument is zero, an error is thrown.

Examples

SELECT n
FROM GENERATE_SERIES(1, 3) AS S(n);

SELECT n
FROM GENERATE_SERIES(3, 1, -1) AS S(n);

SELECT n
FROM GENERATE_SERIES(0, 9.9, 0.1) AS S(n);

SELECT 
  DATEADD(n MINUTE TO timestamp '2025-01-01 12:00') AS START_TIME,
  DATEADD(n MINUTE TO timestamp '2025-01-01 12:00:59.9999') AS FINISH_TIME
FROM GENERATE_SERIES(0, 59) AS S(n);

A similar function exists in PostgreSQL: https://www.postgresql.org/docs/current/functions-srf.html and in MS SQL 2022: https://learn.microsoft.com/en-us/sql/t-sql/functions/generate-series-transact-sql?view=sql-server-ver17

PostgreSQL has the ability to generate date and time sequences, but we don't have interval types, so this isn't implemented. However, by generating a sequence of numbers, we can generate any sequence.

@sim1984
Copy link
Contributor Author

sim1984 commented Oct 30, 2025

PSQL procedure vs built-in function performance test

CREATE EXCEPTION E_INVALID_STEP 'Procedure SP_GENERATE_SERIES required values of parameter STEP_VALUE non equal zero';

SET TERM ^;

CREATE OR ALTER PROCEDURE SP_GENERATE_SERIES (
  START_VALUE  BIGINT,
  FINISH_VALUE BIGINT,
  STEP_VALUE   BIGINT DEFAULT 1)
RETURNS (
  CURRENT_VALUE BIGINT)
AS
BEGIN
  IF (STEP_VALUE = 0) THEN EXCEPTION E_INVALID_STEP;
  IF (STEP_VALUE > 0) THEN
    FOR CURRENT_VALUE = START_VALUE TO FINISH_VALUE BY STEP_VALUE DO
      SUSPEND;
  IF (STEP_VALUE < 0) THEN
    FOR CURRENT_VALUE = START_VALUE DOWNTO FINISH_VALUE BY ABS(STEP_VALUE) DO
      SUSPEND;
END^

SET TERM ;^

1000000 records

SELECT 
  COUNT(*), MIN(CURRENT_VALUE), MAX(CURRENT_VALUE)
FROM SP_GENERATE_SERIES(1, 1000000);

                COUNT                   MIN                   MAX
===================== ===================== =====================
              1000000                     1               1000000

Current memory = 19793200
Delta memory = 240
Max memory = 19869584
Elapsed time = 0.294 sec
Buffers = 2048
Reads = 0
Writes = 0
Fetches = 0
SELECT 
  COUNT(*), MIN(N), MAX(N)
FROM GENERATE_SERIES(1, 1000000) S(N);

                COUNT                   MIN                   MAX
===================== ===================== =====================
              1000000                     1               1000000

Current memory = 19809936
Delta memory = 208
Max memory = 19869616
Elapsed time = 0.047 sec
Buffers = 2048
Reads = 0
Writes = 0
Fetches = 0

10000000 records

SELECT 
  COUNT(*), MIN(CURRENT_VALUE), MAX(CURRENT_VALUE)
FROM SP_GENERATE_SERIES(1, 10000000);

                COUNT                   MIN                   MAX
===================== ===================== =====================
             10000000                     1              10000000

Current memory = 19828352
Delta memory = 240
Max memory = 36589008
Elapsed time = 2.867 sec
Buffers = 2048
Reads = 0
Writes = 0
Fetches = 0
SELECT 
  COUNT(*), MIN(N), MAX(N)
FROM GENERATE_SERIES(1, 10000000) S(N);
                COUNT                   MIN                   MAX
===================== ===================== =====================
             10000000                     1              10000000

Current memory = 19844848
Delta memory = 208
Max memory = 19869616
Elapsed time = 0.453 sec
Buffers = 2048
Reads = 0
Writes = 0
Fetches = 0

@aafemt
Copy link
Contributor

aafemt commented Oct 30, 2025

It is a system selectable procedure with fixed number of parameters and output. Why did you create a new BLR verb instead of using existing BLR for calling it by name?

@sim1984
Copy link
Contributor Author

sim1984 commented Oct 30, 2025

It is a system selectable procedure with fixed number of parameters and output. Why did you create a new BLR verb instead of using existing BLR for calling it by name?

  1. We don't have any system stored procedures, except for those in system packages.
  2. The output type takes the maximum scale of the arguments.

@dyemanov
Copy link
Member

It is a system selectable procedure with fixed number of parameters and output. Why did you create a new BLR verb instead of using existing BLR for calling it by name?

This PR correctly extends blr_table_value_fun with a new subcode, it does not introduce a new BLR code. This was added exactly to implement new table-valued functions (and they may have non-procedure-like syntax).

@aafemt
Copy link
Contributor

aafemt commented Oct 30, 2025

2. The output type takes the maximum scale of the arguments.

It can be handled at DSQL stage.

This PR correctly extends blr_table_value_fun with a new subcode, it does not introduce a new BLR code.

Ugh, I see.

Copy link
Member

@hvlad hvlad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few little notes.

@dyemanov dyemanov self-requested a review October 31, 2025 16:33

struct Impure : public TableValueFunctionScan::Impure
{
std::variant<SINT64, Firebird::Int128> m_start;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to produce undefined behavior.
Impure are generally initialized with memset and nothing gurantees this is correct for a variant.

Copy link
Contributor Author

@sim1984 sim1984 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the alternatives? Use union or impure_value? Move the m_start, m_finish, m_step, and m_result values ​​outside of Impure?

I considered using impure_value, but it seemed redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to use union.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_start, m_finish and m_step definitely don't belong to impure which is supposed to keep volatile data between node executions. Only current value should be there.

Copy link
Member

@dyemanov dyemanov Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_start, m_finish and m_step definitely don't belong to impure which is supposed to keep volatile data between node executions. Only current value should be there.

In this case they will be re-evaluated per every generated row, slowing down the overall performance. Although for literal arguments the difference will be hardly noticeable.


struct Impure : public TableValueFunctionScan::Impure
{
std::variant<SINT64, Firebird::Int128> m_start;
Copy link
Member

@dyemanov dyemanov Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_start, m_finish and m_step definitely don't belong to impure which is supposed to keep volatile data between node executions. Only current value should be there.

In this case they will be re-evaluated per every generated row, slowing down the overall performance. Although for literal arguments the difference will be hardly noticeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants