Skip to content

Conversation

@hatyo
Copy link
Contributor

@hatyo hatyo commented Oct 23, 2025

This PR introduces native support for vector data types in the RelationalFDB, enabling storage and querying of fixed-size numerical vectors commonly used in machine learning and similarity search applications through SQL.

The vector type represents a fixed-dimensional array of floating-point numbers with configurable precision. Vectors are declared using the VECTOR(dimension, precision) syntax:

  CREATE TABLE embeddings (
      id BIGINT,
      embedding VECTOR(128, FLOAT),
      PRIMARY KEY(id)
  );

supported precision types are:

  • HALF - 16-bit half-precision (2 bytes per element)
  • FLOAT - 32-bit single-precision (4 bytes per element)
  • DOUBLE - 64-bit double-precision (8 bytes per element)

Key Features

1. Type Conversions:

Cast numeric arrays to vectors with automatic type promotion:

  SELECT CAST([1, 2, 3] AS VECTOR(3, FLOAT)) AS vec;
  SELECT CAST([1.5, 2.5, 3.5] AS VECTOR(3, HALF)) AS vec;

Supports casting from:

  • INTEGER, BIGINT, FLOAT, DOUBLE arrays.

2. Query Operations:

  • Equality/Inequality: =, !=
  • NULL handling: IS NULL, IS NOT NULL
  • NULL-safe comparisons: IS DISTINCT FROM, IS NOT DISTINCT FROM

Similar to other primitive types, vectors can be used in struct, and (nullable) arrays.

3. JDBC Integration

Vectors are inserted using prepared statements with vector objects:

  PreparedStatement stmt = connection.prepareStatement(
      "INSERT INTO embeddings VALUES (?, ?)");
  stmt.setObject(2, new FloatRealVector(new float[]{0.5f, 1.2f, -0.8f}));

4. YAML Testing support:

Introduced new YAML tags to enable defining and validating vectors via !vXX [values] syntax where:

  • !v16 represents a HALF precision vector parameter
  • !v32 represents a FLOAT precision vector parameter
  • !v64 represents a DOUBLE precision vector parameter

this fixes #3693, it also rectifies incorrect tests as described in #3695, and (partial) fixes for #3665 as well.

@hatyo hatyo added the enhancement New feature or request label Oct 23, 2025
hatyo added 2 commits October 24, 2025 15:24
- Fix other incorrect tests in DdlStatementParsingTest.java.
Copy link
Collaborator

@arnaud-lacurie arnaud-lacurie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
I have left some minor comments requesting for additional tests and asking about the jdbc proto message structure

hatyo added 3 commits October 27, 2025 18:01
- fix few problems in prepared statement handling and type inference
  during normalization.
- add more tests.
@hatyo hatyo marked this pull request as ready for review October 29, 2025 19:10
* **Equality comparison** (:sql:`=`, :sql:`!=`)
* **NULL checks** (:sql:`IS NULL`, :sql:`IS NOT NULL`)
* **NULL-safe comparison** (:sql:`IS DISTINCT FROM`, :sql:`IS NOT DISTINCT FROM`)
* **CAST from numeric arrays** to vectors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a mention here of the fact that you cannot create value indexes on vectors?
We also probably want to mention that you cannot ORDER BY a vector field at this point.
Can we group by a vector field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested any of that (yet), but I can get to it later (not important for now, but nice to cover for completeness).

@github-actions
Copy link

📊 Metrics Diff Analysis Report

Summary

  • New queries: 19
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

New Queries

Count of new queries by file:

  • yaml-tests/src/test/resources/cast-tests.metrics.yaml: 2
  • yaml-tests/src/test/resources/vector.metrics.yaml: 17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Vector type in SQL

2 participants