Skip to content

Conversation

@jrgemignani
Copy link
Contributor

NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:

  • Vertex: id(0), label(1), properties(2)
  • Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:

  • Add field index constants and accessor macros to agtype.h
  • Update age_id(), age_start_id(), age_end_id(), age_label(), age_properties() to use direct field access
  • Add fill_agtype_value_no_copy() for read-only scalar extraction without memory allocation
  • Add compare_agtype_scalar_containers() fast path for scalar comparison
  • Update hash_agtype_value(), equals_agtype_scalar_value(), and compare_agtype_scalar_values() to use direct field access macros
  • Add fast path in get_one_agtype_from_variadic_args() bypassing extract_variadic_args() for single argument case
  • Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and Cypher functions (id, start_id, end_id, label, properties) on vertices and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified: Makefile
new file: regress/expected/direct_field_access.out
new file: regress/sql/direct_field_access.sql
modified: src/backend/utils/adt/agtype.c
modified: src/backend/utils/adt/agtype_util.c
modified: src/include/utils/agtype.h

NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
@github-actions github-actions bot added master override-stale To keep issues/PRs untouched from stale action labels Jan 10, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes vertex and edge field access by leveraging the deterministic key ordering from uniqueify_agtype_object() to use direct array indexing instead of binary search, providing O(1) instead of O(log n) access time.

Key Changes:

  • Introduces direct field access macros and constants for vertex (3 fields) and edge (5 fields) objects based on key length sorting
  • Adds fill_agtype_value_no_copy() for read-only scalar extraction without memory allocation
  • Implements fast path in compare_agtype_scalar_containers() for scalar comparisons
  • Optimizes get_one_agtype_from_variadic_args() to bypass extract_variadic_args() for single agtype arguments

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/include/utils/agtype.h Adds field index constants and direct access macros for vertex/edge objects
src/backend/utils/adt/agtype_util.c Implements no-copy value extraction and scalar comparison fast paths; fixes bounds checking in comparison functions
src/backend/utils/adt/agtype.c Updates accessor functions (id, start_id, end_id, label, properties) to use direct field access; adds fast path for variadic arg handling; fixes 4 assert bugs (= to ==)
regress/sql/direct_field_access.sql Comprehensive test suite covering scalar comparisons, ORDER BY, vertex/edge operations, and edge cases
regress/expected/direct_field_access.out Expected test output
Makefile Registers new regression test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5414 to +5415
* vertex and edge objects due to key length sorting.
*/
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct field access optimization assumes that vertices always have exactly 3 fields and edges always have exactly 5 fields. Consider adding assertions to verify num_pairs before accessing fields directly, similar to what's done in compare_agtype_scalar_values at lines 2000-2001. This would help catch potential data corruption or unexpected vertex/edge structures.

Suggested change
* vertex and edge objects due to key length sorting.
*/
* vertex and edge objects due to key length sorting.
*
* To guard this assumption, verify the expected number of fields for
* vertices and edges before doing direct field access.
*/
#ifdef USE_ASSERT_CHECKING
if (agtv_object->type == AGTV_VERTEX)
{
Assert(agtv_object->val.object.num_pairs == 3);
}
else
{
Assert(agtv_object->val.object.num_pairs == 5);
}
#endif

Copilot uses AI. Check for mistakes.
Comment on lines +6056 to +6063
* end_id=2, start_id=3, properties=4) due to key length sorting.
*/
if (agtv_object->type == AGTV_VERTEX)
{
agtv_result = AGTYPE_VERTEX_GET_PROPERTIES(agtv_object);
}
else
{
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct field access optimization assumes vertices have exactly 3 fields and edges have exactly 5 fields. Consider adding assertions to verify num_pairs before accessing fields directly, similar to what's done in compare_agtype_scalar_values. This would help catch potential data corruption or unexpected vertex/edge structures.

Suggested change
* end_id=2, start_id=3, properties=4) due to key length sorting.
*/
if (agtv_object->type == AGTV_VERTEX)
{
agtv_result = AGTYPE_VERTEX_GET_PROPERTIES(agtv_object);
}
else
{
* end_id=2, start_id=3, properties=4) due to key length sorting.
*
* Verify the expected number of fields before relying on this layout.
*/
if (agtv_object->type == AGTV_VERTEX)
{
Assert(agtv_object->val.object.num_pairs == 3);
agtv_result = AGTYPE_VERTEX_GET_PROPERTIES(agtv_object);
}
else
{
Assert(agtv_object->val.object.num_pairs == 5);

Copilot uses AI. Check for mistakes.
{
graphid id;
agtype_value *id_agt = GET_AGTYPE_VALUE_OBJECT_VALUE(scalar_val, "id");
agtype_value *id_agt = AGTYPE_VERTEX_GET_ID(scalar_val);
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct field access optimization assumes vertices always have an id field at index 0. Consider adding an assertion to verify num_pairs is at least 1 before accessing pairs[0]. This would help catch potential data corruption or unexpected vertex structures.

Copilot uses AI. Check for mistakes.
* Direct field access optimization: label is at index 1 for both vertex
* and edge objects due to key length sorting (id=0, label=1, ...).
*/
label = AGTYPE_VERTEX_GET_LABEL(agtv_value);
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using AGTYPE_VERTEX_GET_LABEL for both vertices and edges is correct since label is at index 1 for both types, but it would be more explicit to use AGTYPE_EDGE_GET_LABEL when agtv_value->type is AGTV_EDGE. Consider adding a conditional to improve code clarity, similar to the age_properties() implementation.

Suggested change
label = AGTYPE_VERTEX_GET_LABEL(agtv_value);
if (agtv_value->type == AGTV_VERTEX)
label = AGTYPE_VERTEX_GET_LABEL(agtv_value);
else /* agtv_value->type must be AGTV_EDGE due to the check above */
label = AGTYPE_EDGE_GET_LABEL(agtv_value);

Copilot uses AI. Check for mistakes.
Comment on lines +5456 to +5457
* objects due to key length sorting (id=0, label=1, end_id=2, start_id=3).
*/
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct field access optimization assumes edges always have exactly 5 fields. Consider adding an assertion to verify num_pairs equals EDGE_NUM_FIELDS before accessing fields directly, similar to what's done in compare_agtype_scalar_values. This would help catch potential data corruption or unexpected edge structures.

Suggested change
* objects due to key length sorting (id=0, label=1, end_id=2, start_id=3).
*/
* objects due to key length sorting (id=0, label=1, end_id=2, start_id=3).
* Verify that the edge has the expected number of fields before
* accessing them directly.
*/
Assert(agtv_object->val.object.num_pairs == EDGE_NUM_FIELDS);

Copilot uses AI. Check for mistakes.
/*
* Direct field access optimization: end_id is at index 2 for edge
* objects due to key length sorting (id=0, label=1, end_id=2).
*/
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct field access optimization assumes edges always have exactly 5 fields. Consider adding an assertion to verify num_pairs equals EDGE_NUM_FIELDS before accessing fields directly. This would help catch potential data corruption or unexpected edge structures.

Suggested change
*/
*/
Assert(agtv_object->val.object.num_pairs == EDGE_NUM_FIELDS);

Copilot uses AI. Check for mistakes.
}
case AGTV_EDGE:
{
graphid id;
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct field access optimization assumes edges always have an id field at index 0. Consider adding an assertion to verify num_pairs is at least 1 before accessing pairs[0]. This would help catch potential data corruption or unexpected edge structures.

Suggested change
graphid id;
graphid id;
Assert(scalar_val->val.object.num_pairs >= 1);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

master override-stale To keep issues/PRs untouched from stale action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant