-
Notifications
You must be signed in to change notification settings - Fork 468
Optimize vertex/edge field access with direct array indexing #2302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Optimize vertex/edge field access with direct array indexing #2302
Conversation
NOTE: This PR was created using AI tools and a human. Leverage deterministic key ordering from uniqueify_agtype_object() to access vertex/edge fields in O(1) instead of O(log n) binary search. Fields are sorted by key length, giving fixed positions: - Vertex: id(0), label(1), properties(2) - Edge: id(0), label(1), end_id(2), start_id(3), properties(4) Changes: - Add field index constants and accessor macros to agtype.h - Update age_id(), age_start_id(), age_end_id(), age_label(), age_properties() to use direct field access - Add fill_agtype_value_no_copy() for read-only scalar extraction without memory allocation - Add compare_agtype_scalar_containers() fast path for scalar comparison - Update hash_agtype_value(), equals_agtype_scalar_value(), and compare_agtype_scalar_values() to use direct field access macros - Add fast path in get_one_agtype_from_variadic_args() bypassing extract_variadic_args() for single argument case - Add comprehensive regression test (30 tests) Performance impact: Improves ORDER BY, hash joins, aggregations, and Cypher functions (id, start_id, end_id, label, properties) on vertices and edges. All previous regression tests were not impacted. Additional regression test added to enhance coverage. modified: Makefile new file: regress/expected/direct_field_access.out new file: regress/sql/direct_field_access.sql modified: src/backend/utils/adt/agtype.c modified: src/backend/utils/adt/agtype_util.c modified: src/include/utils/agtype.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes vertex and edge field access by leveraging the deterministic key ordering from uniqueify_agtype_object() to use direct array indexing instead of binary search, providing O(1) instead of O(log n) access time.
Key Changes:
- Introduces direct field access macros and constants for vertex (3 fields) and edge (5 fields) objects based on key length sorting
- Adds
fill_agtype_value_no_copy()for read-only scalar extraction without memory allocation - Implements fast path in
compare_agtype_scalar_containers()for scalar comparisons - Optimizes
get_one_agtype_from_variadic_args()to bypassextract_variadic_args()for single agtype arguments
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/utils/agtype.h | Adds field index constants and direct access macros for vertex/edge objects |
| src/backend/utils/adt/agtype_util.c | Implements no-copy value extraction and scalar comparison fast paths; fixes bounds checking in comparison functions |
| src/backend/utils/adt/agtype.c | Updates accessor functions (id, start_id, end_id, label, properties) to use direct field access; adds fast path for variadic arg handling; fixes 4 assert bugs (= to ==) |
| regress/sql/direct_field_access.sql | Comprehensive test suite covering scalar comparisons, ORDER BY, vertex/edge operations, and edge cases |
| regress/expected/direct_field_access.out | Expected test output |
| Makefile | Registers new regression test |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * vertex and edge objects due to key length sorting. | ||
| */ |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct field access optimization assumes that vertices always have exactly 3 fields and edges always have exactly 5 fields. Consider adding assertions to verify num_pairs before accessing fields directly, similar to what's done in compare_agtype_scalar_values at lines 2000-2001. This would help catch potential data corruption or unexpected vertex/edge structures.
| * vertex and edge objects due to key length sorting. | |
| */ | |
| * vertex and edge objects due to key length sorting. | |
| * | |
| * To guard this assumption, verify the expected number of fields for | |
| * vertices and edges before doing direct field access. | |
| */ | |
| #ifdef USE_ASSERT_CHECKING | |
| if (agtv_object->type == AGTV_VERTEX) | |
| { | |
| Assert(agtv_object->val.object.num_pairs == 3); | |
| } | |
| else | |
| { | |
| Assert(agtv_object->val.object.num_pairs == 5); | |
| } | |
| #endif |
| * end_id=2, start_id=3, properties=4) due to key length sorting. | ||
| */ | ||
| if (agtv_object->type == AGTV_VERTEX) | ||
| { | ||
| agtv_result = AGTYPE_VERTEX_GET_PROPERTIES(agtv_object); | ||
| } | ||
| else | ||
| { |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct field access optimization assumes vertices have exactly 3 fields and edges have exactly 5 fields. Consider adding assertions to verify num_pairs before accessing fields directly, similar to what's done in compare_agtype_scalar_values. This would help catch potential data corruption or unexpected vertex/edge structures.
| * end_id=2, start_id=3, properties=4) due to key length sorting. | |
| */ | |
| if (agtv_object->type == AGTV_VERTEX) | |
| { | |
| agtv_result = AGTYPE_VERTEX_GET_PROPERTIES(agtv_object); | |
| } | |
| else | |
| { | |
| * end_id=2, start_id=3, properties=4) due to key length sorting. | |
| * | |
| * Verify the expected number of fields before relying on this layout. | |
| */ | |
| if (agtv_object->type == AGTV_VERTEX) | |
| { | |
| Assert(agtv_object->val.object.num_pairs == 3); | |
| agtv_result = AGTYPE_VERTEX_GET_PROPERTIES(agtv_object); | |
| } | |
| else | |
| { | |
| Assert(agtv_object->val.object.num_pairs == 5); |
| { | ||
| graphid id; | ||
| agtype_value *id_agt = GET_AGTYPE_VALUE_OBJECT_VALUE(scalar_val, "id"); | ||
| agtype_value *id_agt = AGTYPE_VERTEX_GET_ID(scalar_val); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct field access optimization assumes vertices always have an id field at index 0. Consider adding an assertion to verify num_pairs is at least 1 before accessing pairs[0]. This would help catch potential data corruption or unexpected vertex structures.
| * Direct field access optimization: label is at index 1 for both vertex | ||
| * and edge objects due to key length sorting (id=0, label=1, ...). | ||
| */ | ||
| label = AGTYPE_VERTEX_GET_LABEL(agtv_value); |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using AGTYPE_VERTEX_GET_LABEL for both vertices and edges is correct since label is at index 1 for both types, but it would be more explicit to use AGTYPE_EDGE_GET_LABEL when agtv_value->type is AGTV_EDGE. Consider adding a conditional to improve code clarity, similar to the age_properties() implementation.
| label = AGTYPE_VERTEX_GET_LABEL(agtv_value); | |
| if (agtv_value->type == AGTV_VERTEX) | |
| label = AGTYPE_VERTEX_GET_LABEL(agtv_value); | |
| else /* agtv_value->type must be AGTV_EDGE due to the check above */ | |
| label = AGTYPE_EDGE_GET_LABEL(agtv_value); |
| * objects due to key length sorting (id=0, label=1, end_id=2, start_id=3). | ||
| */ |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct field access optimization assumes edges always have exactly 5 fields. Consider adding an assertion to verify num_pairs equals EDGE_NUM_FIELDS before accessing fields directly, similar to what's done in compare_agtype_scalar_values. This would help catch potential data corruption or unexpected edge structures.
| * objects due to key length sorting (id=0, label=1, end_id=2, start_id=3). | |
| */ | |
| * objects due to key length sorting (id=0, label=1, end_id=2, start_id=3). | |
| * Verify that the edge has the expected number of fields before | |
| * accessing them directly. | |
| */ | |
| Assert(agtv_object->val.object.num_pairs == EDGE_NUM_FIELDS); |
| /* | ||
| * Direct field access optimization: end_id is at index 2 for edge | ||
| * objects due to key length sorting (id=0, label=1, end_id=2). | ||
| */ |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct field access optimization assumes edges always have exactly 5 fields. Consider adding an assertion to verify num_pairs equals EDGE_NUM_FIELDS before accessing fields directly. This would help catch potential data corruption or unexpected edge structures.
| */ | |
| */ | |
| Assert(agtv_object->val.object.num_pairs == EDGE_NUM_FIELDS); |
| } | ||
| case AGTV_EDGE: | ||
| { | ||
| graphid id; |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direct field access optimization assumes edges always have an id field at index 0. Consider adding an assertion to verify num_pairs is at least 1 before accessing pairs[0]. This would help catch potential data corruption or unexpected edge structures.
| graphid id; | |
| graphid id; | |
| Assert(scalar_val->val.object.num_pairs >= 1); |
NOTE: This PR was created using AI tools and a human.
Leverage deterministic key ordering from uniqueify_agtype_object() to access vertex/edge fields in O(1) instead of O(log n) binary search.
Fields are sorted by key length, giving fixed positions:
Changes:
Performance impact: Improves ORDER BY, hash joins, aggregations, and Cypher functions (id, start_id, end_id, label, properties) on vertices and edges.
All previous regression tests were not impacted.
Additional regression test added to enhance coverage.
modified: Makefile
new file: regress/expected/direct_field_access.out
new file: regress/sql/direct_field_access.sql
modified: src/backend/utils/adt/agtype.c
modified: src/backend/utils/adt/agtype_util.c
modified: src/include/utils/agtype.h